This is the the R markdown script written in R studio (2023.09.0+463 “Desert Sunflower” Release) used to summarise the systematic map database from Martin et al. 2024 “Evidence of the impacts of pharmaceuticals on aquatic animal behaviour (EIPAAB): a systematic map and open access database” (doi: XXXXX).
It is designed to act as a starting point for anyone who wishes to use the ‘Evidence of the Impacts of Pharmaceuticals on Aquatic Animal Behaviour’ (EIPAAB) database for their own projects.
This script was authored by Jake M Martin (jakemartin.org)
Contact: jake.martin@deakin.edu or jake.martin@slu.se *P.s Apologies for any spelling mistakes in the script I am dyslexic and this is a very long document
If you are not familiar with R, here’s a beginners guide (https://www.youtube.com/watch?v=_V8eKsto3Ug).
Here’s a link to download R (https://cran.r-project.org/) and R studio (https://posit.co/products/open-source/rstudio/), and a guide on how to do so.
This is an R markdown file, which makes annotating and running R code more user friendly, it is also easy to reproducible and share in a variate of formates (e.g. PDF). The R code is embed within chunks, and the output for code will be embedded under the chuck.
# This is a chuck
All other text outside of the chucks are annotations (like this). Hashtags used outside of chucks are used to create headers and to structure the file. Hashtags within the chucks are used for more precise annotation within the code.
If you are not familiar with R markdown, here’s a guide (https://www.youtube.com/watch?v=tKUufzpoHDE)
Creating out input and output directories. They will be made within the current parent directory (i.e. where the R sciprt is saved)
This is code creates a folder and saves the directory as figure_path. This is where we will export our figures
figures_path <- paste0(getwd(), "/figures")
if (!dir.exists(figures_path)) {
dir.create(figures_path)
}
This is code creates a folder and saves the directory as output_path This is where we will export our data
output_path <- paste0(getwd(), "/output-data")
if (!dir.exists(output_path)) {
dir.create(output_path)
}
Input directory
input_path <- paste0(getwd(), "/input-data")
if (!dir.exists(figures_path)) {
dir.create(figures_path)
}
The ‘Evidence of the Impacts of Pharmaceuticals on Aquatic Animal Behaviour’ (EIPAAB) database has 96 columns and 1754 rows. The columns represent various forms of metadata extracted from articles that were included in Martin et al. 2024 “Evidence of the impacts of pharmaceuticals on aquatic animal behaviour: a systematic map and open access database” (doi: XXXXX).
The READ-ME file which explains what each metadata is, how it was extracted, what structure it has, and at what level it applies, is available at XXX. Below I have imported the read me for accessibility. I highly recommend you read the READ-ME before conducting any of your own meta-analysis to make sure you have interoperated the data correctly.
More generally, column names that start with ‘validity’ are metadata relating to study validity, those that start with ‘specie’s relate to species information (population), those that start with ’compound’ relate to the chemical information (exposure), those that start with ‘behav’ relate to behaviour information (outcome). The order of columns reflects both the level the metadata is extracted at (i.e. article level or species by compound level; see level in READ-ME), as well as the general category of metadata (i.e. validity, species, compound, behaviour).
setwd(input_path)
READ_ME <- read.csv("READ-ME.csv", na = "NA") # loading the READ-ME file
These are the R packages required to run the script. I have added them to a list so that I can install them all in one go using the function below called loaded_packages. This function I have made will load all the packages in the list below, if the packages are not already installed, this function will first install them.
If you want to install and load each package separately, you can use the code install.packages() and require(), I have given a example below.
# this installs and load packages
# need to install pacman
pacman::p_load("tidyverse",
"ggraph",
"igraph",
"ggrepel",
"RColorBrewer",
"ggtree",
"treeio",
"ape",
"gridExtra",
"ggdist",
"highcharter",
"pander"
)
# required_packages <- c("tidyverse",
# "ggraph",
# "igraph",
# "ggrepel",
# "RColorBrewer",
# "ggtree",
# "treeio",
# "ape",
# "gridExtra",
# "ggdist",
# "highcharter",
# "pander"
# )
# Alternatively, you can install an load them one by one
# install.packages("tidyverse")
# require("tidyverse")
This is the function to load all packages (and install if necessary) in the list above.
type = ‘source’ instructs R to download and install the package from its source code rather than from a precompiled binary, this is optional.
# loaded_packages <- lapply(required_packages, function(package) {
# if (!require(package, character.only = TRUE)) {
# install.packages(package, type = 'source') # type = 'source' is optional
# if (!require(package, character.only = TRUE)) {
# return(FALSE)
# }
# }
# return(TRUE)
# })
#
# # Check if all packages are loaded successfully
# if (all(unlist(loaded_packages))) {
# cat("All packages loaded\n")
# } else {
# cat("Some packages failed to load or install\n")
# }
For ggtree and treeio you may need to run this code for instillation
# if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#
# BiocManager::install("ggtree")
Importing the EIPAAB-database.csv database (accessed from: https://osf.io/atwy6/).
If the CSV files are in the same working directory (wd) as this R script, you will not need to use setwd(), but if the files are located elsewhere you will need to specify this in setwd(), and run all lines at once. In R markdown the working directory changes back to default after the chuck is run.
setwd(input_path)
EIPAAB_database <- read.csv("EIPAAB-database.csv", na = "NA")
The first thing we will look at is how many unique (distinct) articles there are in the database, and how many rows of data there are.
There are 901 articles, with 1740 rows.
EIPAAB_database %>%
dplyr::distinct(article_id) %>% # Returns a list of distinct article_id
nrow(.) # Returns the length of the current file (which is the list of distinct article_id)
## [1] 901
EIPAAB_database %>%
nrow(.) # Returns the length of the current file (which is the length of the whole datafile)
## [1] 1740
Each row represent a unique species by compound combination within a given article. This is represented by the column unique_row_id This is a combination of the extractors response id, specie,s and compound. For example, R_0Bqz2RQ4JxPfBkZ_Danio_rerio_Diazepam, response id = R_0Bqz2RQ4JxPfBkZ, species = Danio rerio, and compound = Diazepam
EIPAAB_database %>%
dplyr::select(unique_row_id) %>% # selects just the unique_column_id column
dplyr::arrange(unique_row_id) %>% # arranges the column alphabetically so the same examples will be given everytime
dplyr::slice(1:10) # Returns only the first 10 rows
## unique_row_id
## 1 R_0Bqz2RQ4JxPfBkZ_Danio_rerio_Diazepam
## 2 R_0CHlDBs9ipt4suZ_Astyanax_mexicanus_Aripiprazole
## 3 R_0Ck0AOjLDWukBUt_Procambarus_clarkii_Chlordiazepoxide
## 4 R_0JvaI9dlvTbozUl_Daphnia_magna_Fluoxetine
## 5 R_0JvaI9dlvTbozUl_Daphnia_magna_Sertraline
## 6 R_0Srt7zn9MwHKne1_Danio_rerio_Escitalopram
## 7 R_0p8ZEROmCGlSR7r_Oryzias_latipes_Fluoxetine
## 8 R_10C0XxjAUoZmibO_Amphiprion_ocellaris_17-alpha-ethinylestradiol
## 9 R_10GdzsXlrkwamUt_Daphnia_magna_Cisplatin
## 10 R_10NOT0XWL5TXN5m_Coenagrion_hastulatum_Diphenhydramine
Now the number of total treatments represented in the data, this is the total number of unique doses per species by compound combination.
In the map the number of treatments was only extracted for water-borne exposures, the NAs, represent other exposure routes. Therefore, the number of water-borne exposures treatments are 6294, and there are an additional 226 articles that don’t have treatment numbers. We know they all have at least two treatments, a control and a compound of interest, because that is part of the inclusion criteria. So we could add the number of NAs * 2 to the total, this would be 6746 total treatment groups. Although, this would likely be an underestimate of the true total.
EIPAAB_database %>%
dplyr::summarise(
groups = sum(compound_treatment_levels, na.rm = TRUE), # Calculate the sum of 'compound_treatment_levels' while ignoring NA values
nas = sum(is.na(compound_treatment_levels)), # Count the number of NA values in 'compound_treatment_levels'
total = groups + (nas * 2) # Calculate the total by adding 'groups' to twice the number of NAs
)
## groups nas total
## 1 6154 226 6606
Let’s look at how the evidence collected breaks down by the three study motivations
total_atricles <- EIPAAB_database %>%
dplyr::distinct(article_id) %>%
nrow()
# Analyze the study motivations in the dataset
EIPAAB_database %>%
dplyr::group_by(article_id) %>% # Group the data by 'article_id'
dplyr::sample_n(1) %>% # Randomly sample one row from each group (i.e., each unique 'article_id')
dplyr::ungroup() %>% # Ungroup the data to remove the previous grouping
dplyr::group_by(study_motivation) %>% # Group the data by 'study_motivation'
dplyr::reframe( # Create a summary data frame with the count and percentage of each study motivation
n = length(study_motivation), # Count the number of occurrences of each study motivation
`%` = round(n / total_atricles, 3) * 100 # Calculate the percentage of total articles
) %>%
dplyr::arrange(desc(n)) # Arrange the resulting data frame in descending order of the count
## # A tibble: 3 × 3
## study_motivation n `%`
## <chr> <int> <dbl>
## 1 Environmental 510 56.6
## 2 Medical 234 26
## 3 Basic research 157 17.4
Here we are changing the order of these in the database to “Environmental”, “Medical”, “Basic research” for plots.
EIPAAB_database <- EIPAAB_database %>%
dplyr::mutate(study_motivation = fct_relevel(study_motivation, "Environmental", "Medical", "Basic research"))
Year range is 1974 to 2022, so 48 years worth of empirical research has contributed to this evidence base.
EIPAAB_database %>%
dplyr::reframe(min_year = min(year),
max_year = max(year),
total_years = max_year-min_year)
## min_year max_year total_years
## 1 1974 2022 48
Now making a summary for the number of publications per year based on study motivation
# Create a complete sequence of years and all unique study motivations
all_years <- as.character(1974:2022)
all_study_motivations <- unique(EIPAAB_database$study_motivation)
# Create a data frame with all combinations of year and study motivation
all_combinations <- expand.grid(year = all_years, study_motivation = all_study_motivations, stringsAsFactors = FALSE)
# Summarize the data
pub_year <- EIPAAB_database %>%
group_by(year, study_motivation) %>%
summarize(n = length(unique(article_id)), .groups = 'drop') %>%
mutate(year = as.character(year))
# Join with the complete grid of years and study motivations
pub_year_complete <- all_combinations %>%
left_join(pub_year, by = c("year", "study_motivation")) %>%
mutate(n = if_else(is.na(n), 0, n),
year = as.numeric(year))
Here’s a summary figure for the manuscript (MS).
# Define the colour palette
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4") # Making colour theme to apply to plot
# Create the plot
pub_year_fig <- pub_year_complete %>%
# Group years before 1996 and reformat the year column
dplyr::mutate(
year = as.character(if_else(year < 1996, 1996, year)), # Grouping years before 1996
year = if_else(year == "1996", "<1997", year) # Renaming 1995 group to "<1996"
) %>%
# Creating the plot
ggplot(aes(y = n, x = year, fill = study_motivation)) +
geom_bar(stat = "identity", width = 0.9) +
# Apply the custom colours
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Number of articles"
)
# Display the plot
pub_year_fig
Saving the figure as a PDF
setwd(figures_path)
#ggsave("pub_year_fig.png", plot = pub_year_fig, width = 10, height = 5, dpi = 300) #if you want to save a png
ggsave("study_pub_year_fig.pdf", plot = pub_year_fig, width = 10, height = 5)
Making values for cumulative and relative growth in articles. This is the cumulative number of articles per year for each study moitvation, as well as the relative growth based on 2007. We selected 2007 for a 15 year overview in growth.
pub_year_growth <- pub_year_complete %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(
n_cumulative = cumsum(n), # Calculate the cumulative sum of 'n'
n_cumulative_prop = n_cumulative / max(n_cumulative), # Calculate the cumulative proportion
n_2007 = ifelse(year == 2007, n, NA_real_), # Get n value for year 2007
n_2007 = first(na.omit(n_2007)), # Propagate the n_2012 value within the group
n_ratio_to_2007 = n / n_2007 # Calculate number of articles relative to that of 2007
) %>%
dplyr::ungroup() %>%
dplyr::select(study_motivation, year, n, n_cumulative, n_cumulative_prop, n_ratio_to_2007)
Making a plot for each motivation cumulative growth since 1974 (the first identified study)
cumulative_articles_fig <- pub_year_growth %>%
ggplot(aes(y = n_cumulative, x = year, colour = study_motivation)) +
geom_line(stat = "identity", linewidth = 1.5) +
geom_hline(yintercept = 0) +
scale_x_continuous(breaks = seq(1974, 2022, by = 1)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Articles cumulative growth"
)
cumulative_articles_fig
setwd(figures_path)
ggsave("study_cumulative_articles_fig.pdf", plot = cumulative_articles_fig, width = 10, height = 5)
Let’s look at relative growth compared to the research area more broadly
I will identify the most common research area based on each study motivation.
Environmental motivation = Environmental Sciences & Ecology Medical motivation = Neurosciences & Neurology Basic research = Neurosciences & Neurology
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, wos_research_areas) %>%
dplyr::reframe(n = length(doi)) %>%
dplyr::mutate(wos_research_areas = str_trim(wos_research_areas)) %>%
tidyr::separate_rows(wos_research_areas, sep = ";") %>%
dplyr::mutate(wos_research_areas = str_trim(wos_research_areas)) %>%
dplyr::group_by(study_motivation, wos_research_areas) %>%
dplyr::reframe(n_total = sum(n)) %>%
dplyr::arrange(desc(n_total)) %>%
dplyr::arrange(study_motivation) %>%
dplyr::group_by(study_motivation) %>%
dplyr::slice(1:2) %>%
dplyr::ungroup()
## # A tibble: 6 × 3
## study_motivation wos_research_areas n_total
## <fct> <chr> <int>
## 1 Environmental Environmental Sciences & Ecology 321
## 2 Environmental Toxicology 193
## 3 Medical Neurosciences & Neurology 106
## 4 Medical Pharmacology & Pharmacy 72
## 5 Basic research Neurosciences & Neurology 56
## 6 Basic research Behavioral Sciences 43
We will now compare the proportion cumulative growth of each study motivation against the most common research areas based on WoS.
I have searched articles published within these research areas from 1992-2022, and create a database to compare against.
Each search indued only a date range (e.g. PY=(1992-2021)) AND the given web of science resarch area (e.g. WC=(Pharmacology & Pharmacy)). Searchers were done on the 04/07/2024. Only the total number of articles each year was taken.
First we will import the research field annual number of articles database I create (martin-et-al-supp-file-9-wos-research-areas-1992-2022.csv). It is provide as supplementary file 9.
setwd(input_path)
wos_research_areas_n <- read.csv("martin-et-al-supp-file-9-wos-research-areas-1992-2022.csv") %>%
dplyr::arrange(year) %>%
dplyr::group_by(research_area) %>%
dplyr::mutate(
n_cumulative = cumsum(n), # Calculate the cumulative sum of 'n'
n_cumulative_prop = n_cumulative / max(n_cumulative), # Calculate the cumulative proportion
n_2007 = ifelse(year == 2007, n, NA_real_), # Get n value for year 2011
n_2007= first(na.omit(n_2007)), # Propagate the n_2011 value within the group
n_ratio_to_2007 = n / n_2007 # Calculate n_ratio_to_2000
) %>%
dplyr::ungroup() %>%
dplyr::select(research_area, year, n, n_cumulative, n_cumulative_prop, n_ratio_to_2007)
Combined the number of articles with those in the EIPAAB database
pub_year_growth_comp <- pub_year_growth %>%
dplyr::rename(research_area = study_motivation)
wos_research_areas_comp <- wos_research_areas_n %>%
rbind(., pub_year_growth_comp) %>%
dplyr::mutate(research_area = factor(research_area, levels=c("Environmental", "Medical", "Basic research",
"Environmental Sciences and Ecology", "Toxicology",
"Neurosciences and Neurology", "Pharmacology and Pharmacy",
"All Research Areas")))
Let’s see what the relative growth was in 2022 (the latest included year in the evidence base)
wos_research_areas_comp %>%
dplyr::filter(year == 2022)
## # A tibble: 8 × 6
## research_area year n n_cumulative n_cumulative_prop n_ratio_to_2007
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Pharmacology and … 2022 8.04e4 1458833 1 1.77
## 2 Neurosciences and… 2022 2.44e4 450515 1 1.60
## 3 Toxicology 2022 1.69e4 391858 1 1.41
## 4 Environmental Sci… 2022 4.60e3 103123 1 1.07
## 5 All Research Areas 2022 3.62e6 66839129 1 1.79
## 6 Environmental 2022 5.7 e1 510 1 19
## 7 Medical 2022 3 e1 234 1 10
## 8 Basic research 2022 9 e0 158 1 2.25
Let’s now compare the relative growth in Environmental research
# Define the colour palette
env_colour_theme <- c("#60BD6C", "#2E4B22", "black") # Making colour theme to apply to plot
enviro_comp <- c("Environmental", "Environmental Sciences and Ecology", "All Research Areas")
line_types <- c("Environmental" = "solid",
"Environmental Sciences and Ecology" = "dashed",
"All Research Areas" = "solid")
relative_growth_env_fig <- wos_research_areas_comp %>%
dplyr::filter(research_area %in% enviro_comp, year > 2006) %>%
ggplot(aes(y = n_ratio_to_2007, x = year, colour = research_area, linetype = research_area)) +
geom_line(stat = "identity", linewidth = 1.5) +
scale_x_continuous(breaks = seq(2007, 2022, by = 1)) +
scale_y_continuous(limits = c(0, 22), breaks = seq(0, 22, by = 4)) + # Scale Y axis from 0 to 22
scale_colour_manual(values = env_colour_theme, name = "Research Area") +
scale_linetype_manual(values = line_types) + # Apply manual line types
guides(linetype = "none") + # Remove legend for line types
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Relative growth compared to 2007 (15 year baseline)"
)
relative_growth_env_fig
setwd(figures_path)
ggsave("study_relative_growth_env_fig.pdf", plot = relative_growth_env_fig, width = 5, height = 5)
Comparing the relative growth in medical research
# Define the colour palette
med_colour_theme <- c("#D359A1", "#D2137F", "black") # Making colour theme to apply to plot
med_comp <- c("Medical", "Neurosciences and Neurology", "All Research Areas")
line_types <- c("Medical" = "solid",
"Neurosciences and Neurology" = "dashed",
"All Research Areas" = "solid")
relative_growth_med_fig <- wos_research_areas_comp %>%
dplyr::filter(research_area %in% med_comp, year > 2006) %>%
ggplot(aes(y = n_ratio_to_2007, x = year, colour = research_area, linetype = research_area)) +
geom_line(stat = "identity", linewidth = 1.5) +
scale_x_continuous(breaks = seq(2007, 2022, by = 1)) +
scale_y_continuous(limits = c(0, 22), breaks = seq(0, 22, by = 4)) + # Scale Y axis from 0 to 22
scale_colour_manual(values = med_colour_theme, name = "Research Area") +
scale_linetype_manual(values = line_types) + # Apply manual line types
guides(linetype = "none") + # Remove legend for line types
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Relative growth compared to 2007 (15 year baseline)"
)
relative_growth_med_fig
setwd(figures_path)
ggsave("study_relative_growth_med_fig.pdf", plot = relative_growth_med_fig, width = 5, height = 5)
Comparing relative growth in basic research
# Define the colour palette
basic_colour_theme <- c("#3C82C4", "#26276D", "black") # Making colour theme to apply to plot
basic_comp <- c("Basic research", "Neurosciences and Neurology", "All Research Areas")
line_types <- c("Basic research" = "solid",
"Neurosciences and Neurology" = "dashed",
"All Research Areas" = "solid")
relative_growth_base_fig <- wos_research_areas_comp %>%
dplyr::filter(research_area %in% basic_comp, year > 2006) %>%
ggplot(aes(y = n_ratio_to_2007, x = year, colour = research_area, linetype = research_area)) +
geom_line(stat = "identity", linewidth = 1.5) +
scale_x_continuous(breaks = seq(2007, 2022, by = 1)) +
scale_y_continuous(limits = c(0, 22), breaks = seq(0, 22, by = 4)) + # Scale Y axis from 0 to 22
scale_colour_manual(values = basic_colour_theme, name = "Research Area") +
scale_linetype_manual(values = line_types) + # Apply manual line types
guides(linetype = "none") + # Remove legend for line types
theme_classic() +
# Customizing the theme
theme(
legend.position = c(0.05, 1), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1), # Ensuring the legend box aligns properly at the top-left corner
axis.text.x = element_text(angle = 90, vjust = 0.5) # Rotating x-axis labels for better readability
) +
# Adding axis labels
labs(
x = "Year of publication",
y = "Relative growth compared to 2007 (15 year baseline)"
)
relative_growth_base_fig
setwd(figures_path)
ggsave("study_relative_growth_base_fig.pdf", plot = relative_growth_base_fig, width = 5, height = 5)
Looking at the link between the PEO elements. What was the average study design.
Below I have made a table that groups by these elements to see the average study design
It was 1 compound, 1 species and 1 behavioural class (41%)
behav_boolean <- c("behav_movement_boolean", "behav_boldness_boolean", "behav_foraging_boolean", "behav_antipredator_boolean", "behav_mating_boolean", "behav_post_mating_boolean", "behav_agression_boolean", "behav_sociality_boolean", "behav_cognition_boolean", "behav_noncat_boolean")
EIPAAB_database %>%
dplyr::mutate(behav_n = rowSums(across(all_of(behav_boolean)), na.rm = TRUE)) %>% # how many behav class measured
dplyr::group_by(article_id) %>%
dplyr::arrange(desc(behav_n)) %>%
dplyr::slice(1) %>%
dplyr::ungroup() %>%
dplyr::select(compound_n, species_n, behav_n) %>%
dplyr::group_by(compound_n, species_n, behav_n) %>%
dplyr::reframe(n= length(compound_n),
'%' = round(n/902*100,1)) %>%
dplyr::arrange(desc(n))
## # A tibble: 45 × 5
## compound_n species_n behav_n n `%`
## <int> <int> <dbl> <int> <dbl>
## 1 1 1 1 374 41.5
## 2 1 1 2 160 17.7
## 3 2 1 1 78 8.6
## 4 1 1 3 63 7
## 5 3 1 1 45 5
## 6 2 1 2 34 3.8
## 7 4 1 1 24 2.7
## 8 3 1 2 15 1.7
## 9 1 2 1 13 1.4
## 10 5 1 1 9 1
## # ℹ 35 more rows
I summary df that has the number of PEO elements in each of the 901 studies
PEO_element_summary <- EIPAAB_database %>%
dplyr::mutate(behav_n = rowSums(across(all_of(behav_boolean)), na.rm = TRUE)) %>%
dplyr::group_by(article_id) %>%
dplyr::arrange(desc(behav_n)) %>%
dplyr::slice(1) %>%
dplyr::ungroup() %>%
dplyr::select(compound_n, species_n, behav_n)
Looking at the number of species used
PEO_element_summary %>%
dplyr::group_by(species_n) %>%
dplyr::reframe(n = length(species_n),
'%' = n/901)
## # A tibble: 5 × 3
## species_n n `%`
## <int> <int> <dbl>
## 1 1 873 0.969
## 2 2 25 0.0277
## 3 3 1 0.00111
## 4 4 1 0.00111
## 5 5 1 0.00111
Looking at the number of compounds used
PEO_element_summary %>%
dplyr::group_by(compound_n) %>%
dplyr::reframe(n = length(compound_n),
'%' = n/901)
## # A tibble: 18 × 3
## compound_n n `%`
## <int> <int> <dbl>
## 1 1 624 0.693
## 2 2 127 0.141
## 3 3 67 0.0744
## 4 4 32 0.0355
## 5 5 16 0.0178
## 6 6 8 0.00888
## 7 7 6 0.00666
## 8 8 5 0.00555
## 9 9 1 0.00111
## 10 10 2 0.00222
## 11 11 3 0.00333
## 12 12 2 0.00222
## 13 13 2 0.00222
## 14 14 2 0.00222
## 15 16 1 0.00111
## 16 18 1 0.00111
## 17 25 1 0.00111
## 18 52 1 0.00111
PEO_element_summary %>%
dplyr::group_by(behav_n) %>%
dplyr::reframe(n = length(species_n),
'%' = n/901)
## # A tibble: 6 × 3
## behav_n n `%`
## <dbl> <int> <dbl>
## 1 1 583 0.647
## 2 2 227 0.252
## 3 3 78 0.0866
## 4 4 10 0.0111
## 5 5 2 0.00222
## 6 7 1 0.00111
Let’s take a closer look at species information
There are 173 species in the EIPAAB database
EIPAAB_database %>%
dplyr::distinct(species_name) %>%
nrow()
## [1] 173
The number of spp each study motivation
EIPAAB_database %>%
dplyr::group_by(study_motivation) %>%
dplyr::reframe(n_spp = length(unique(species_name)),
total_study = length(unique(article_id)),
rel_n = n_spp/total_study)
## # A tibble: 3 × 4
## study_motivation n_spp total_study rel_n
## <fct> <int> <int> <dbl>
## 1 Environmental 143 510 0.280
## 2 Medical 26 234 0.111
## 3 Basic research 43 158 0.272
There are 21 class
EIPAAB_database %>%
dplyr::distinct(species_class) %>%
nrow()
## [1] 21
There are 935 different groups of animals used across all 901 studies (i.e. some studies had more then one species)
EIPAAB_database %>%
dplyr::distinct(unique_population_id) %>%
nrow(.)
## [1] 935
Let’s make a Cladogram to get an overview of what taxa are in the database
spp_taxonomy <- EIPAAB_database %>%
dplyr::group_by(species_name) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::filter(!is.na(species_family), !str_detect(species_species, "spp."), species_kingdom != "Chromista") %>%
dplyr::select("species_kingdom", "species_phylum", "species_class", "species_order",
"species_family", "species_genus", "species_species") %>%
dplyr::mutate(species_species = paste0(substr(species_genus, 1, 1), ". ", sub("^[^ ]+ ", "", species_species)))
# The mutate changes the spp name to abbreviate the Genus (e.g. Aeshna cyanea to A. cyanea)
Create a hierarchical structure for the plot
taxonomy <- spp_taxonomy[, c("species_kingdom", "species_phylum", "species_class", "species_order", "species_family", "species_genus", "species_species")]
taxonomy[] <- lapply(taxonomy, factor)
# Create a phylogenetic tree
phylo_tree <- as.phylo.formula(~species_kingdom/species_phylum/species_class/species_order/species_family/species_genus/species_species, data = taxonomy)
Manual creating a phylo_tree (with equal branches)
ggtree_obj <- ggtree(phylo_tree, branch.length='none', layout='circular')
# Extract the phylum information for coloring
#taxonomy$label <- paste0(substr(spp_taxonomy$species_genus, 1, 1), ". ", spp_taxonomy$species_species)
class_info <- taxonomy$species_class[match(phylo_tree$tip.label, taxonomy$species_species)]
# Add the phylum information to the ggtree object
ggtree_obj <- ggtree_obj %<+% data.frame(label = phylo_tree$tip.label, class = class_info)
# Create a color vector for the phylum levels
class_colors <- rainbow(length(unique(class_info)))
names(class_colors) <- unique(class_info)
# Plot the cladogram with colored branches
spp_cladogram <- ggtree_obj +
geom_tiplab(size=3) + #If you want to add species names
geom_tree(aes(color=class)) +
scale_color_manual(values = class_colors) +
theme(legend.position = "right")
spp_cladogram
setwd(figures_path)
ggsave("spp_cladogram.pdf", plot = spp_cladogram, width = 8, height = 5)
Let’s group by class to see the major taxonomic Classes used
First removing cases where species_species was “spp.” replacing with NA for taxonomic classification
EIPAAB_database <- EIPAAB_database %>%
dplyr::mutate(species_species = if_else(species_species == "spp.", NA, species_species))
Let’s look at the major Class
# Total number of spp
n_spp <- EIPAAB_database %>%
dplyr::distinct(species_name) %>%
nrow()
# Number of spp per Class and per phylum
spp_classes <- EIPAAB_database %>%
dplyr::group_by(species_class, species_phylum) %>%
dplyr::reframe(count_class = length(unique(species_name)),
percent_class = round(count_class/n_spp*100,1)) %>%
dplyr::group_by(species_phylum) %>%
dplyr::mutate(count_phylum = sum(count_class),
percent_phylum = round(count_phylum/n_spp*100,1)) %>%
dplyr::ungroup() %>%
dplyr::arrange(desc(percent_class))
spp_classes %>%
dplyr::slice(1:10)
## # A tibble: 10 × 6
## species_class species_phylum count_class percent_class count_phylum
## <chr> <chr> <int> <dbl> <int>
## 1 Actinopterygii Chordata 71 41 87
## 2 Malacostraca Arthropoda 21 12.1 42
## 3 Gastropoda Mollusca 19 11 28
## 4 Amphibia Chordata 12 6.9 87
## 5 Branchiopoda Arthropoda 10 5.8 42
## 6 Bivalvia Mollusca 8 4.6 28
## 7 Insecta Arthropoda 8 4.6 42
## 8 Rhabditophora Platyhelminthes 5 2.9 6
## 9 Reptilia Chordata 3 1.7 87
## 10 Copepoda Arthropoda 2 1.2 42
## # ℹ 1 more variable: percent_phylum <dbl>
Making a figure for the 15 most abundant Class, it terms of species diversity in the EIPAAB database
class_n_spp_fig <- spp_classes %>%
dplyr::arrange(desc(percent_class)) %>% # arrange the dataset
dplyr::slice(1:15) %>% # Take only the most diverse 15 Class
dplyr::mutate(species_class = fct_reorder(species_class, percent_class), # Order by diversity
species_phylum = fct_reorder(species_phylum, desc(percent_phylum))) %>% # Order by diversity
ggplot(aes(x=species_class, y=count_class, color = species_phylum)) +
geom_segment(aes(x=species_class, xend=species_class, y=0, yend=count_class)) +
geom_point(size=4) +
geom_text(aes(label = count_class),
hjust=-1.2,
size=3.5,
color="black") +
scale_colour_brewer(palette= "Dark2") +
coord_flip() +
ylim(0, 75) +
theme_classic() +
labs(
x = "",
y = "Number of distict species in the database"
) +
theme(legend.position = c(0., 0.05), # Positioning the legend inside the plot
legend.justification = c(-3, 0), # Bottom left inside the plot
legend.box.just = "right",
legend.background = element_rect(fill=alpha('white', 0.5))
)
class_n_spp_fig
Save the figure
setwd(figures_path)
ggsave("spp_class_n_spp_fig.pdf", plot = class_n_spp_fig, width = 5, height = 5)
Here’s a look at the % at the phylum level
# In the class summary we also included percent_phylum
spp_classes %>%
dplyr::group_by(species_phylum) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::select(-species_class, -count_class, -percent_class) %>%
dplyr::arrange(desc(percent_phylum))
## # A tibble: 8 × 3
## species_phylum count_phylum percent_phylum
## <chr> <int> <dbl>
## 1 Chordata 87 50.3
## 2 Arthropoda 42 24.3
## 3 Mollusca 28 16.2
## 4 Platyhelminthes 6 3.5
## 5 Annelida 3 1.7
## 6 Echinodermata 3 1.7
## 7 Cnidaria 2 1.2
## 8 Rotifera 2 1.2
A ring chart at the phylum level
ring_plot_df <- spp_classes %>%
dplyr::slice(1:15) %>%
dplyr::group_by(species_phylum) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::select(species_phylum, count_phylum) %>%
dplyr::mutate(percent_phylum = count_phylum/sum(count_phylum)) %>%
dplyr::arrange(desc(percent_phylum))
# Compute the cumulative percentages (top of each rectangle)
ring_plot_df$ymax = cumsum(ring_plot_df$percent_phylum)
# Compute the bottom of each rectangle
ring_plot_df$ymin = c(0, head(ring_plot_df$ymax, n=-1))
# Compute label position
ring_plot_df$labelPosition <- (ring_plot_df$ymax + ring_plot_df$ymin) / 2
# Compute a good label
ring_plot_df$label <- paste0(ring_plot_df$species_phylum, "\n (n = ", ring_plot_df$count_phylum, ")")
phylum_ring_fig <- ring_plot_df %>%
dplyr::mutate(species_phylum = fct_reorder(species_phylum, desc(percent_phylum))) %>%
ggplot(aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=species_phylum)) +
geom_rect() +
coord_polar(theta="y") +
geom_label(x=5, aes(y=labelPosition, label=label), size=3, alpha = 0.8) +
scale_fill_brewer(palette= "Dark2") +
xlim(c(2, 5)) +
theme_void() +
theme(legend.position = "none")
phylum_ring_fig
setwd(figures_path)
ggsave("spp_phylum_ring_fig.pdf", plot = phylum_ring_fig, width = 5, height = 5)
Now we will look at how many times each phylum, class, order, family, genus, species appear in the database
First we will make a dataset that counts the number of species within each phylum, class, order, family, and genus.
# Step 1: Separate and pivot the data
lineage_data <- EIPAAB_database %>%
pivot_longer(cols = c("species_phylum", "species_class", "species_order",
"species_family", "species_genus", "species_species"),
names_to = "lineage_level", values_to = "classification") %>%
dplyr::mutate(lineage_level = str_remove(lineage_level, "species_"))
# Define the order
lineage_levels_order <- c("phylum", "class", "order", "family", "genus", "species")
# Step 2: Create parent-child relationships
lineage_data <- lineage_data %>%
group_by(unique_row_id) %>%
mutate(parent = case_when(
lineage_level == "phylum" ~ "Animalia",
lineage_level == "class" ~ lag(classification, 1),
lineage_level == "order" ~ lag(classification, 1),
lineage_level == "family" ~ lag(classification, 1),
lineage_level == "genus" ~ lag(classification, 1),
lineage_level == "species" ~ lag(classification, 1),
)) %>%
ungroup()
Here we sum the total number of species used in the database across each taxonomic classification
n_rows <- EIPAAB_database %>%
nrow()
lineage_count_use <- lineage_data %>%
dplyr::group_by(classification, lineage_level, parent) %>%
dplyr::reframe(classification_count = length(unique_row_id),
classification_percent = round(classification_count/n_rows*100,1))
Making a plot to look at the 15 most commonly used class, but you can do this at any of the taxonomic levels
class_use_fig <- lineage_count_use %>%
dplyr::filter(lineage_level == "class") %>%
dplyr::arrange(desc(classification_percent)) %>%
dplyr::slice(1:15) %>%
dplyr::mutate(classification = fct_reorder(classification, classification_percent),
parent = fct_reorder(parent, desc(classification_percent))) %>%
ggplot(aes(x=classification, y=classification_percent, color = parent)) +
geom_segment(aes(x=classification, xend=classification, y=0, yend=classification_percent)) +
geom_point(size=4) +
geom_text(aes(label = paste0(round(classification_percent,2), "%")),
hjust=-0.5,
size=3.5,
color="black") +
scale_colour_brewer(palette= "Dark2") +
coord_flip() +
ylim(0, 100) +
theme_classic() +
labs(
x = "",
y = "Percentage representation in the database"
) +
theme(legend.position = c(0., 0.05), # Positioning the legend inside the plot
legend.justification = c(-3, 0), # Bottom left inside the plot
legend.box.just = "right",
legend.background = element_rect(fill=alpha('white', 0.5))
)
class_use_fig
setwd(figures_path)
ggsave("spp_class_use_fig.pdf", plot = class_use_fig, width = 5, height = 5)
lineage_count_use %>%
dplyr::filter(lineage_level == "class") %>%
dplyr::group_by(classification) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::arrange(desc(classification_percent))
## # A tibble: 21 × 5
## classification lineage_level parent classification_count
## <chr> <chr> <chr> <int>
## 1 Actinopterygii class Chordata 1312
## 2 Branchiopoda class Arthropoda 130
## 3 Gastropoda class Mollusca 69
## 4 Malacostraca class Arthropoda 61
## 5 Amphibia class Chordata 44
## 6 Rhabditophora class Platyhelminthes 29
## 7 Bivalvia class Mollusca 25
## 8 Hydrozoa class Cnidaria 22
## 9 Insecta class Arthropoda 17
## 10 Cephalopoda class Mollusca 7
## # ℹ 11 more rows
## # ℹ 1 more variable: classification_percent <dbl>
Here’s break down by Phylum
ring_use_plot_df <- lineage_count_use %>%
dplyr::filter(lineage_level == "phylum") %>%
dplyr::arrange(desc(classification_percent)) %>%
dplyr::mutate(classification_percent = round(classification_percent, 2)) %>%
dplyr::slice(1:8)
# Compute the cumulative percentages (top of each rectangle)
ring_use_plot_df$ymax = cumsum(ring_use_plot_df$classification_percent)
# Compute the bottom of each rectangle
ring_use_plot_df$ymin = c(0, head(ring_use_plot_df$ymax, n=-1))
# Compute label position
ring_use_plot_df$labelPosition <- (ring_use_plot_df$ymax + ring_use_plot_df$ymin) / 2
# Compute a good label
ring_use_plot_df$label <- paste0(ring_use_plot_df$classification, "\n (", ring_use_plot_df$classification_percent, "%)")
phylum_use_ring_fig <- ring_use_plot_df %>%
dplyr::mutate(classification = fct_reorder(classification, desc(classification_percent))) %>%
ggplot(aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=classification)) +
geom_rect() +
coord_polar(theta="y") +
geom_label(x=5, aes(y=labelPosition, label=label), size=3, alpha = 0.8) +
scale_fill_brewer(palette= "Dark2") +
xlim(c(2, 5)) +
theme_void()
#theme(legend.position = "none")
phylum_use_ring_fig
setwd(figures_path)
ggsave("spp_phylum_use_ring_fig.pdf", plot = phylum_use_ring_fig, width = 5, height = 5)
Making a data set that looks at relative representation by each motivation, and the total
lineage_count_use_motivation <- lineage_data %>%
dplyr::group_by(study_motivation, lineage_level, parent, classification) %>%
dplyr::reframe(classification_count = length(unique_row_id)) %>%
dplyr::group_by(study_motivation, lineage_level) %>%
dplyr::mutate(n_motivation = sum(classification_count)) %>%
dplyr::ungroup() %>%
dplyr::mutate(classification_percent = (classification_count/n_motivation)*100)
lineage_count_use_all <- lineage_data %>%
dplyr::group_by(lineage_level, parent, classification) %>%
dplyr::reframe(classification_count = length(unique_row_id)) %>%
dplyr::group_by(lineage_level) %>%
dplyr::mutate(n_motivation = sum(classification_count)) %>%
dplyr::ungroup() %>%
dplyr::mutate(classification_percent = (classification_count/n_motivation)*100) %>%
dplyr::mutate(study_motivation = "All")
lineage_count_use_motivation <- lineage_count_use_motivation %>%
rbind(., lineage_count_use_all) %>%
dplyr::mutate(study_motivation = fct_relevel(study_motivation, "All", "Environmental", "Medical", "Basic research"))
Here’s a tile plot to compare the use of different taxa across the study motivations
class_order_df <- lineage_count_use %>%
dplyr::filter(lineage_level == "class") %>%
dplyr::arrange(classification_percent) %>%
dplyr::mutate(class_order = 1:nrow(.)) %>%
dplyr::select(classification, class_order)
class_use_motivation_fig <- lineage_count_use_motivation %>%
dplyr::filter(lineage_level == "class") %>%
dplyr::full_join(., class_order_df, by = "classification") %>%
dplyr::mutate(classification = fct_reorder(classification, class_order),
parent = fct_reorder(parent, desc(class_order))) %>%
ggplot(aes(x = study_motivation, y = classification, fill = classification_percent)) +
geom_tile() +
geom_text(aes(label = paste0(round(classification_percent, 1),"%"))) +
scale_fill_gradient(name = expression("Relative\nabudance (%)"),
low = "#FFFFFF", high = "#231F20") +
theme_classic() +
# theme(axis.text.x = element_text(angle = 90, hjust = 1.1, vjust = 0.4)) +
labs(x = "Study motivation",
y = "Taxonomic class")
class_use_motivation_fig
setwd(figures_path)
ggsave("spp_class_use_motivation_fig.pdf", plot = class_use_motivation_fig, width = 5, height = 5)
Let’s see what the most common species were. This is calculated at the population level (i.e. doesn’t count each species mutiple times if multiple compounds were used in a single article; unique_population_id)
n_total <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>%
dplyr::reframe(n = length(unique_population_id)) %>%
nrow(.)
n_spp <- EIPAAB_database %>%
dplyr::group_by(unique_population_id, species_ncbi_taxonomy_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(species_name, species_ncbi_taxonomy_id) %>%
dplyr::reframe(n = length(species_name),
percent = round(n/n_total*100,1)) %>%
dplyr::arrange(desc(n)) %>%
dplyr::mutate(spp_number = 1:nrow(.))
n_spp
## # A tibble: 173 × 5
## species_name species_ncbi_taxonomy_id n percent spp_number
## <chr> <chr> <int> <dbl> <int>
## 1 Danio rerio NCBI:txid7955 412 44.1 1
## 2 Daphnia magna NCBI:txid35525 54 5.8 2
## 3 Pimephales promelas NCBI:txid90988 32 3.4 3
## 4 Betta splendens NCBI:txid158456 26 2.8 4
## 5 Poecilia reticulata NCBI:txid8081 26 2.8 5
## 6 Gambusia holbrooki NCBI:txid37273 18 1.9 6
## 7 Carassius auratus NCBI:txid7957 16 1.7 7
## 8 Oryzias latipes NCBI:txid8090 16 1.7 8
## 9 Gasterosteus aculeatus NCBI:txid69293 14 1.5 9
## 10 Oncorhynchus mykiss NCBI:txid8022 12 1.3 10
## # ℹ 163 more rows
Making a broad category of abundance (article_n_group) to make the summary and figure more digestible
n_spp_fig_data <- n_spp %>%
dplyr::mutate(species_ncbi_taxonomy_id = fct_reorder(species_ncbi_taxonomy_id, desc(n))) %>%
dplyr::mutate(
article_n_group = case_when(
n == 1 ~ "One only",
n >= 2 & n <= 5 ~ "Between 2 and 5",
n >= 5 & n <= 10 ~ "Between 6 and 10",
n >= 10 ~ "Greater than 10",
TRUE ~ "Others"
)
)
This is the number of species in each category
article_n_group_summary <- n_spp_fig_data %>%
dplyr::group_by(article_n_group) %>%
dplyr::reframe(n_cat = length(species_name))
article_n_group_summary
## # A tibble: 4 × 2
## article_n_group n_cat
## <chr> <int>
## 1 Between 2 and 5 53
## 2 Between 6 and 10 6
## 3 Greater than 10 11
## 4 One only 103
Making a plot to show the distribution of species use in the EIPAAB database
n_spp_fig <- n_spp_fig_data %>%
dplyr::mutate(article_n_group = fct_relevel(article_n_group, "Greater than 10", "Between 6 and 10", "Between 2 and 5", "One only")) %>%
ggplot(aes(y = n, x = spp_number, color = article_n_group)) +
geom_line(linewidth = 1, alpha = 0.2) +
geom_point(stat = "identity", size = 1, alpha = 0.8) +
scale_color_manual(
values = c(
"One only" = "#E94039",
"Between 2 and 5" = "#F18E76",
"Between 6 and 10" = "#877FBC",
"Greater than 10" = "#4D479D"
),
labels = c(
"One only" = "One only (n = 104)",
"Between 2 and 5" = "Between 2 and 5 (n = 53)",
"Between 6 and 10" = "Between 6 and 10 (n = 11)",
"Greater than 10" = "Greater than 10 (n = 6)"
)
) + # Set colours for each category
theme_classic() +
theme(
legend.position = c(-0.3, 0.4), # Positioning the legend inside the plot
legend.justification = c(-3, 0), # Bottom left inside the plot
legend.box.just = "right"
) +
labs(
x = paste0("Species (1-174)"),
y = "Number of articles",
color = "Article number category"
)
n_spp_fig
setwd(figures_path)
ggsave("spp_n_spp_fig.pdf", plot = n_spp_fig, width = 5, height = 5)
Making a list of the most common 15 species
n_spp_used <- EIPAAB_database %>%
dplyr::distinct(unique_population_id) %>%
nrow(.)
top_15_spp_list <- EIPAAB_database %>%
dplyr::group_by(species_name) %>%
dplyr::summarise(n = length(unique(unique_population_id)), .groups = 'drop') %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:15) %>%
dplyr::pull(species_name) %>%
as.list()
Making a summary dataframe base on study motivation
common_species <- EIPAAB_database %>%
dplyr::filter(species_name %in% top_15_spp_list) %>%
dplyr::group_by(species_name, study_motivation) %>%
dplyr::summarise(n = length(unique(article_id)), .groups = 'drop') %>%
tidyr::complete(species_name, study_motivation, fill = list(n = 0))
species_order <- common_species %>%
group_by(species_name) %>%
summarise(total_n = sum(n), .groups = 'drop') %>%
arrange(desc(total_n)) %>%
ungroup()
common_species <- common_species %>%
inner_join(species_order, by = "species_name") %>%
mutate(species_name = fct_reorder(species_name, total_n),
study_motivation = fct_relevel(study_motivation, "Environmental", "Medical", "Basic research"))
A plot of the number of times each of the 15 overall most common species appeared in articles within the EIPAAB databse by study motivation. It’s a little hard to see in the chunk output, try viewing in an external window.
top_15_spp_fig <- common_species %>%
ggplot(aes(x=species_name, y=n, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = n), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
coord_flip() +
theme_classic() +
labs(
x = "",
y = "Number of studies"
) +
theme()
top_15_spp_fig
setwd(figures_path)
ggsave("spp_top_15_spp_fig.pdf", plot = top_15_spp_fig, width = 5, height = 10)
Summarising the top 10 in each motivation more specifically
spp_moitivation_summary <- EIPAAB_database %>%
dplyr::group_by(species_name, study_motivation) %>%
dplyr::summarise(n = length(unique(unique_population_id)), .groups = 'drop') %>%
tidyr::complete(species_name, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(prop = n/total) %>%
dplyr::select(-total)
This plot shows the top 10 in each motivation
top_10_env_spp <- spp_moitivation_summary %>%
dplyr::filter(study_motivation == "Environmental") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(species_name = fct_reorder(species_name, n)) %>%
ggplot(aes(x=species_name, y=n)) +
geom_col(width = 0.01, colour = "#60BD6C", fill = "#60BD6C") +
geom_point(size = 2, colour = "#60BD6C", fill = "#60BD6C") +
geom_text(aes(label = n), hjust=-0.6, size=3.5, color="black") +
coord_flip() +
theme_classic() +
labs(
x = "",
y = "Number of studies",
title = "Environmental"
) +
theme(
plot.title = element_text(size = 12)
)
top_10_med_spp <- spp_moitivation_summary %>%
dplyr::filter(study_motivation == "Medical") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(species_name = fct_reorder(species_name, n)) %>%
ggplot(aes(x=species_name, y=n)) +
geom_col(width = 0.01, colour = "#D359A1", fill = "#D359A1") +
geom_point(size = 2, colour = "#D359A1", fill = "#D359A1") +
geom_text(aes(label = n), hjust=-0.6, size=3.5, color="black") +
coord_flip() +
theme_classic() +
labs(
x = "",
y = "Number of studies",
title = "Medical"
) +
theme(
plot.title = element_text(size = 12)
)
top_10_base_spp <- spp_moitivation_summary %>%
dplyr::filter(study_motivation == "Basic research") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(species_name = fct_reorder(species_name, n)) %>%
ggplot(aes(x=species_name, y=n)) +
geom_col(width = 0.01, colour = "#3C82C4", fill = "#3C82C4") +
geom_point(size = 2, colour = "#3C82C4", fill = "#3C82C4") +
geom_text(aes(label = n), hjust=-0.6, size=3.5, color="black") +
coord_flip() +
theme_classic() +
labs(
x = "",
y = "Number of studies",
title = "Basic"
) +
theme(
plot.title = element_text(size = 12)
)
Here’s a plot to compare the top 10 spp in each study motivation more specifically
top_10_combind_plot <- grid.arrange(top_10_env_spp, top_10_med_spp, top_10_base_spp, ncol = 3)
Looking at the number of distinct species used in each motivation group
EIPAAB_database %>%
dplyr::group_by(study_motivation) %>%
dplyr::reframe(n_motivation = n_distinct(article_id),
n_distinct_spp = n_distinct(species_name))
## # A tibble: 3 × 3
## study_motivation n_motivation n_distinct_spp
## <fct> <int> <int>
## 1 Environmental 510 143
## 2 Medical 234 26
## 3 Basic research 158 43
First checking how many species have IUCN data, which we will use to assess habitat differences
species_iucn_summary <- EIPAAB_database %>%
dplyr::group_by(species_name) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::mutate(species_iucn_bin = if_else(is.na(species_iucn_doi), "No", "Yes")) %>%
dplyr::group_by(species_iucn_bin) %>%
dplyr::reframe(n = length(species_iucn_bin))
species_iucn_summary
## # A tibble: 2 × 2
## species_iucn_bin n
## <chr> <int>
## 1 No 68
## 2 Yes 105
Summarising the IUNC habitat type, some species will have multiple habitats, so we split the string by the sepreate (;)
habitat_summary <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(species_iucn_habitat) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_iucn_habitat = str_trim(species_iucn_habitat)) %>%
tidyr::separate_rows(species_iucn_habitat, sep = ";") %>% # each spp has multiple habitats the string needs spliting
dplyr::group_by(species_iucn_habitat) %>%
dplyr::reframe(n_articles = sum(n)) %>% # now a sum for each habitat
arrange(desc(n_articles))
habitat_summary
## # A tibble: 13 × 2
## species_iucn_habitat n_articles
## <chr> <int>
## 1 Wetlands inland 738
## 2 Artificial or Aquatic and Marine 187
## 3 <NA> 164
## 4 Marine Neritic 99
## 5 Marine Coastal or Supratidal 44
## 6 Marine Intertidal 27
## 7 Forest 23
## 8 Grassland 22
## 9 Artificial or Terrestrial 21
## 10 Shrubland 18
## 11 Savanna 13
## 12 Marine Oceanic 10
## 13 Unknown 2
Checking how many freshwater vs marine species there are. Wetlands inland categories are freshwater bodies where as Marine have multiple categories (Marine Neritic, Marine Coastal or Supratidal, Marine Intertidal, Marine Oceanic).
habitat_summary %>%
dplyr::filter(str_starts(species_iucn_habitat, "Marine") | species_iucn_habitat == "Wetlands inland") %>% # Only habitats of interest
dplyr::mutate(aquatic_type = if_else(species_iucn_habitat == "Wetlands inland", "Freshwater", "Marine")) %>% # New category
dplyr::group_by(aquatic_type) %>%
dplyr::reframe(n_articles = sum(n_articles)) %>% # Final sums
dplyr::ungroup() %>%
dplyr::mutate(n_total = sum(n_articles),
percent = round(n_articles/n_total*100,1))
## # A tibble: 2 × 4
## aquatic_type n_articles n_total percent
## <chr> <int> <int> <dbl>
## 1 Freshwater 738 918 80.4
## 2 Marine 180 918 19.6
Lets break this up by study motivation
habitat_summary_all <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, species_iucn_habitat) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_iucn_habitat = str_trim(species_iucn_habitat)) %>%
tidyr::separate_rows(species_iucn_habitat, sep = ";") %>% # each spp has multiple habitats the string needs spliting
dplyr::group_by(species_iucn_habitat, study_motivation) %>%
dplyr::reframe(n_articles = sum(n)) %>% # now a sum for each habitat
arrange(desc(n_articles))
habitat_summary_all
## # A tibble: 37 × 3
## species_iucn_habitat study_motivation n_articles
## <chr> <fct> <int>
## 1 Wetlands inland Environmental 384
## 2 Wetlands inland Medical 223
## 3 Artificial or Aquatic and Marine Environmental 135
## 4 Wetlands inland Basic research 131
## 5 <NA> Environmental 131
## 6 Marine Neritic Environmental 80
## 7 Artificial or Aquatic and Marine Basic research 41
## 8 Marine Coastal or Supratidal Environmental 33
## 9 <NA> Basic research 22
## 10 Marine Intertidal Environmental 19
## # ℹ 27 more rows
Selecting only habitats of interest and allocating to Freshwater or Marine
freshwater_marine <- habitat_summary_all %>%
dplyr::filter(str_starts(species_iucn_habitat, "Marine") | species_iucn_habitat == "Wetlands inland") %>% # Only habitats of interest
dplyr::mutate(aquatic_type = if_else(species_iucn_habitat == "Wetlands inland", "Freshwater", "Marine")) %>% # New category
dplyr::group_by(aquatic_type, study_motivation) %>%
dplyr::reframe(n_articles = sum(n_articles)) %>% # Final sums
dplyr::group_by(study_motivation) %>%
dplyr::mutate(n_cat = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(prop = n_articles/n_cat) # A proportion of those identify as freshwater vs marine
freshwater_marine
## # A tibble: 6 × 5
## aquatic_type study_motivation n_articles n_cat prop
## <chr> <fct> <int> <int> <dbl>
## 1 Freshwater Environmental 384 520 0.738
## 2 Freshwater Medical 223 236 0.945
## 3 Freshwater Basic research 131 162 0.809
## 4 Marine Environmental 136 520 0.262
## 5 Marine Medical 13 236 0.0551
## 6 Marine Basic research 31 162 0.191
aquatic_type_order <- c("Freshwater", "Marine")
# Define the black and grey color theme
#color_theme <- c("#7FAB91", "#2A4A64")
# Calculate cumulative positions for text labels
freshwater_marine <- freshwater_marine %>%
dplyr::mutate(aquatic_type = factor(aquatic_type, levels = aquatic_type_order)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::arrange(desc(aquatic_type)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
# Define the black and grey color theme
color_theme <- c("#7FAB91", "#2A4A64")
# Create the plot
habitat_fig <- freshwater_marine %>%
mutate(study_motivation = fct_relevel(study_motivation, "Basic research", "Medical", "Environmental")) %>%
ggplot(aes(y = prop, x = study_motivation, fill = aquatic_type, group = aquatic_type)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white", size = 3) +
scale_fill_manual(values = color_theme, name = "Habitat") +
theme_classic() +
theme(legend.position = "right") +
labs(
x = "Study motivation",
y = "Proportion of all species assigned to freshwater or marine habitat"
) +
coord_flip()
habitat_fig
setwd(figures_path)
ggsave("spp_habitat_fig.pdf", plot = habitat_fig, width = 10, height = 5)
We should also consider how many records did not inculde IUCN reports and thus habitat.
Let’s make a plot that shows how many didn’t have an assigned IUNC habitat. To add to the above figure.
no_habitat <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::mutate(species_iucn_bin = if_else(is.na(species_iucn_doi), "No", "Yes")) %>%
dplyr::group_by(species_iucn_bin) %>%
dplyr::reframe(n = length(species_iucn_bin))
n_total <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
nrow(.)
no_habitat <- no_habitat %>%
dplyr::mutate(prop = n/n_total)
no_habitat
## # A tibble: 2 × 3
## species_iucn_bin n prop
## <chr> <int> <dbl>
## 1 No 163 0.174
## 2 Yes 772 0.826
Here’s the plot
# Define the black and grey color theme
black_and_grey <- c("#BCBEC0", "#414042")
yes_order <- c("Yes", "No")
# Calculate cumulative positions for text labels
habitat_info_df_fig <- no_habitat %>%
dplyr::mutate(species_iucn_bin = factor(species_iucn_bin, levels = yes_order)) %>%
dplyr::arrange(desc(species_iucn_bin)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
habitat_info_fig <- habitat_info_df_fig %>%
dplyr::mutate(species_iucn_bin = factor(species_iucn_bin, levels = yes_order)) %>%
ggplot(aes(y = prop, x = 1, fill = species_iucn_bin)) +
geom_bar(stat = "identity", width = 0.1) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white") +
scale_fill_manual(values = black_and_grey, name = "Habitat") +
theme_classic() +
theme(legend.position = "right",
axis.text.x = element_blank(), # Remove x-axis text
axis.ticks.x = element_blank() # Remove x-axis ticks
) +
labs(
x = "",
y = "Proportion of species tested in the database"
)
habitat_info_fig
setwd(figures_path)
ggsave("spp_habitat_info_fig.pdf", plot = habitat_info_fig, width = 5, height = 10)
Let’s get an overall summary first, without Unknown or not specified life stages
EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(species_stage) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_stage = str_trim(species_stage)) %>%
tidyr::separate_rows(species_stage, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::filter(species_stage != "Unknown or not specified") %>%
dplyr::group_by(species_stage) %>%
dplyr::reframe(n_articles = sum(n)) %>% # now a sum for each habitat
dplyr::mutate(total_stages = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_percent = round(n_articles/total_stages*100,1))
## # A tibble: 4 × 4
## species_stage n_articles total_stages overall_percent
## <chr> <int> <int> <dbl>
## 1 Adult 443 831 53.3
## 2 Egg or embryo 46 831 5.5
## 3 Juvenile 123 831 14.8
## 4 Larvae 219 831 26.4
Let’s take a look at spp life stages used in the EIPAAB databasebased on study motivation
stage_summary_all <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, species_stage) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_stage = str_trim(species_stage)) %>%
tidyr::separate_rows(species_stage, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_stage, study_motivation) %>%
dplyr::reframe(n_total = sum(n)) %>% # now a sum for each habitat
dplyr::group_by(study_motivation) %>%
dplyr::mutate(n_motivation = sum(n_total)) %>%
dplyr::ungroup()
stage_summary_all
## # A tibble: 15 × 4
## species_stage study_motivation n_total n_motivation
## <chr> <fct> <int> <int>
## 1 Adult Environmental 236 586
## 2 Adult Medical 135 246
## 3 Adult Basic research 72 165
## 4 Egg or embryo Environmental 31 586
## 5 Egg or embryo Medical 11 246
## 6 Egg or embryo Basic research 4 165
## 7 Juvenile Environmental 92 586
## 8 Juvenile Medical 14 246
## 9 Juvenile Basic research 17 165
## 10 Larvae Environmental 127 586
## 11 Larvae Medical 64 246
## 12 Larvae Basic research 28 165
## 13 Unknown or not specified Environmental 100 586
## 14 Unknown or not specified Medical 22 246
## 15 Unknown or not specified Basic research 44 165
For life stages that are described, what is the breakdown
stage_order <- c("Adult", "Juvenile", "Larvae", "Egg or embryo")
# Define the black and grey color theme
color_theme <- c("#A14323", "#D86C2F", "#EE9E5A", "#F3E9A5")
# Making proportion for those that did report
stage_summary_described <- stage_summary_all %>%
dplyr::filter(species_stage %in% stage_order) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(n_motivation = sum(n_total)) %>%
dplyr::ungroup() %>%
dplyr::mutate(prop = n_total/n_motivation)
# Calculate cumulative positions for text labels
stage_summary_described <- stage_summary_described %>%
dplyr::mutate(species_stage = factor(species_stage, levels = stage_order)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::arrange(desc(species_stage)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 4)
# Create the plot
life_stage_fig <- stage_summary_described %>%
mutate(study_motivation = fct_relevel(study_motivation, "Basic research", "Medical", "Environmental")) %>%
ggplot(aes(y = prop, x = study_motivation, fill = species_stage, group = species_stage)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white", size = 3) +
scale_fill_manual(values = color_theme, name = "Life stage") +
theme_classic() +
theme(legend.position = "right") +
labs(
x = "Study motivation",
y = "Proportion of all species assigned to a life stage"
) +
coord_flip()
life_stage_fig
setwd(figures_path)
ggsave("spp_life_stage_fig.pdf", plot = life_stage_fig, width = 10, height = 5)
Let’s also look at how many where unknown or not described
stage_summary_info <- stage_summary_all %>%
dplyr::mutate(stage_reported = if_else(species_stage == "Unknown or not specified", "No", "Yes")) %>%
dplyr::group_by(stage_reported) %>%
dplyr::reframe(n = sum(n_total))
n_total <- stage_summary_info %>%
dplyr::reframe(n_total = sum(n)) %>%
pull(n_total)
stage_summary_info <- stage_summary_info %>%
dplyr::mutate(prop = n/n_total)
Here’s the plot
# Define the black and grey color theme
black_and_grey <- c("#BCBEC0", "#414042")
yes_order <- c("Yes", "No")
# Calculate cumulative positions for text labels
stage_summary_info <- stage_summary_info %>%
dplyr::mutate(stage_reported = factor(stage_reported, levels = yes_order)) %>%
dplyr::arrange(desc(stage_reported)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
stage_info_fig <- stage_summary_info %>%
dplyr::mutate(stage_reported = factor(stage_reported, levels = yes_order)) %>%
ggplot(aes(y = prop, x = 1, fill = stage_reported)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white") +
scale_fill_manual(values = black_and_grey, name = "Stage reported") +
theme_classic() +
theme(legend.position = "right",
axis.text.x = element_blank(), # Remove x-axis text
axis.ticks.x = element_blank() # Remove x-axis ticks
) +
labs(
x = "",
y = "Proportion of species tested in the database"
)
stage_info_fig
setwd(figures_path)
ggsave("spp_stage_info_fig.pdf", plot = stage_info_fig, width = 2.5, height = 5)
First overall breakdown by female and male without including unreported or hermaphroditic animals
EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(species_sex) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_sex = str_trim(species_sex)) %>%
tidyr::separate_rows(species_sex, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_sex) %>%
dplyr::summarise(n_articles = sum(n), .groups = 'drop') %>% # now a sum for each habitat
tidyr::complete(species_sex,
fill = list(n_articles = 0)) %>% # make a full df with empty categories = 0
dplyr::ungroup() %>%
dplyr::filter(species_sex == "Female" | species_sex == "Male") %>%
dplyr::mutate(total_sex = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_percent = round(n_articles/total_sex*100,1))
## # A tibble: 2 × 4
## species_sex n_articles total_sex overall_percent
## <chr> <int> <int> <dbl>
## 1 Female 280 623 44.9
## 2 Male 343 623 55.1
Let’s take a look at the sex of spp used in the EIPAAB database
sex_summary_all <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, species_sex) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_sex = str_trim(species_sex)) %>%
tidyr::separate_rows(species_sex, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_sex, study_motivation) %>%
dplyr::summarise(n_articles = sum(n), .groups = 'drop') %>% # now a sum for each habitat
tidyr::complete(species_sex, study_motivation,
fill = list(n_articles = 0)) %>% # make a full df with empty categories = 0
dplyr::ungroup() %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_sex = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_prop = n_articles/total_sex)
sex_summary_all
## # A tibble: 12 × 5
## species_sex study_motivation n_articles total_sex overall_prop
## <chr> <fct> <int> <int> <dbl>
## 1 Female Environmental 136 645 0.211
## 2 Female Medical 91 322 0.283
## 3 Female Basic research 53 206 0.257
## 4 Hermaphrodites Environmental 4 645 0.00620
## 5 Hermaphrodites Medical 0 322 0
## 6 Hermaphrodites Basic research 0 206 0
## 7 Male Environmental 180 645 0.279
## 8 Male Medical 99 322 0.307
## 9 Male Basic research 64 206 0.311
## 10 Unknown or not specified Environmental 325 645 0.504
## 11 Unknown or not specified Medical 132 322 0.410
## 12 Unknown or not specified Basic research 89 206 0.432
Let’s look at just the proportion of those defined as male and female
sex_male_female <- sex_summary_all %>%
dplyr::filter(species_sex == "Female" | species_sex == "Male") %>%
dplyr::select(-total_sex, -overall_prop) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_male_female = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(prop = n_articles/total_male_female)
Making the plot
sex_order <- c("Female", "Male")
color_theme <- c("#eb4729", "#1b909a")
# Calculate cumulative positions for text labels
sex_male_female <- sex_male_female %>%
dplyr::mutate(species_sex = factor(species_sex, levels = sex_order)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::arrange(desc(species_sex)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
# Create the plot
sex_fig <- sex_male_female %>%
mutate(study_motivation = fct_relevel(study_motivation, "Basic research", "Medical", "Environmental")) %>%
ggplot(aes(y = prop, x = study_motivation, fill = species_sex, group = species_sex)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white", size = 3) +
scale_fill_manual(values = color_theme, name = "Sex") +
theme_classic() +
theme(legend.position = "right") +
labs(
x = "Study motivation",
y = "Proportion of all species assigned to female or male"
) +
coord_flip()
sex_fig
setwd(figures_path)
ggsave("spp_sex_fig.pdf", plot = sex_fig, width = 10, height = 5)
Now let’s look at those not assgined to a sex
sex_summary_info <- sex_summary_all %>%
dplyr::mutate(sex_reported = if_else(species_sex == "Unknown or not specified", "No", "Yes")) %>%
dplyr::group_by(sex_reported) %>%
dplyr::reframe(n = sum(n_articles))
n_total <- sex_summary_info %>%
dplyr::reframe(n_total = sum(n)) %>%
dplyr::pull(n_total)
sex_summary_info <- sex_summary_info %>%
dplyr::mutate(prop = n/n_total)
Here’s the plot
# Define the black and grey color theme
black_and_grey <- c("#BCBEC0", "#414042")
yes_order <- c("Yes", "No")
# Calculate cumulative positions for text labels
sex_summary_info <- sex_summary_info %>%
dplyr::mutate(sex_reported = factor(sex_reported, levels = yes_order)) %>%
dplyr::arrange(desc(sex_reported)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
sex_info_fig <- sex_summary_info %>%
dplyr::mutate(sex_reported = factor(sex_reported, levels = yes_order)) %>%
ggplot(aes(y = prop, x = 1, fill = sex_reported)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white") +
scale_fill_manual(values = black_and_grey, name = "Sex reported") +
theme_classic() +
theme(legend.position = "right",
axis.text.x = element_blank(), # Remove x-axis text
axis.ticks.x = element_blank() # Remove x-axis ticks
) +
labs(
x = "",
y = "Proportion of all species"
)
sex_info_fig
setwd(figures_path)
ggsave("spp_sex_info_fig.pdf", plot = sex_info_fig, width = 2.5, height = 5)
Breakdown without unreported
EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(species_source) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_source = str_trim(species_source)) %>%
tidyr::separate_rows(species_source, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_source) %>%
dplyr::summarise(n_articles = sum(n), .groups = 'drop') %>% # now a sum for each habitat
tidyr::complete(species_source,
fill = list(n_articles = 0)) %>% # make a full df with empty categories = 0
dplyr::filter(species_source != "Not reported") %>%
dplyr::ungroup() %>%
dplyr::mutate(total_source = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_percent = round(n_articles/total_source*100, 1)) %>%
dplyr::arrange(desc(overall_percent))
## # A tibble: 5 × 4
## species_source n_articles total_source overall_percent
## <chr> <int> <int> <dbl>
## 1 Commercial supplier or fish farm 305 802 38
## 2 Lab stock of undisclosed origin 213 802 26.6
## 3 Wild collected 196 802 24.4
## 4 Lab stock from commercial supplier 55 802 6.9
## 5 Lab stock from wild population 33 802 4.1
Let’s look at where the animals were sourced for articles in the EIPAAB database
source_summary_all <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, species_source) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(species_source = str_trim(species_source)) %>%
tidyr::separate_rows(species_source, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(species_source, study_motivation) %>%
dplyr::summarise(n_articles = sum(n), .groups = 'drop') %>% # now a sum for each habitat
tidyr::complete(species_source, study_motivation,
fill = list(n_articles = 0)) %>% # make a full df with empty categories = 0
dplyr::ungroup() %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_source = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(overall_prop = n_articles/total_source)
source_summary_all
## # A tibble: 18 × 5
## species_source study_motivation n_articles total_source overall_prop
## <chr> <fct> <int> <int> <dbl>
## 1 Commercial supplier or… Environmental 134 548 0.245
## 2 Commercial supplier or… Medical 101 241 0.419
## 3 Commercial supplier or… Basic research 71 161 0.441
## 4 Lab stock from commerc… Environmental 29 548 0.0529
## 5 Lab stock from commerc… Medical 16 241 0.0664
## 6 Lab stock from commerc… Basic research 10 161 0.0621
## 7 Lab stock from wild po… Environmental 25 548 0.0456
## 8 Lab stock from wild po… Medical 1 241 0.00415
## 9 Lab stock from wild po… Basic research 7 161 0.0435
## 10 Lab stock of undisclos… Environmental 119 548 0.217
## 11 Lab stock of undisclos… Medical 64 241 0.266
## 12 Lab stock of undisclos… Basic research 30 161 0.186
## 13 Not reported Environmental 72 548 0.131
## 14 Not reported Medical 51 241 0.212
## 15 Not reported Basic research 25 161 0.155
## 16 Wild collected Environmental 169 548 0.308
## 17 Wild collected Medical 8 241 0.0332
## 18 Wild collected Basic research 18 161 0.112
source_summary <- source_summary_all %>%
dplyr::filter(species_source != "Not reported") %>%
dplyr::select(species_source, study_motivation, n_articles) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_reported = sum(n_articles)) %>%
dplyr::ungroup() %>%
dplyr::mutate(prop = n_articles/total_reported)
source_summary
## # A tibble: 15 × 5
## species_source study_motivation n_articles total_reported prop
## <chr> <fct> <int> <int> <dbl>
## 1 Commercial supplier or fi… Environmental 134 476 0.282
## 2 Commercial supplier or fi… Medical 101 190 0.532
## 3 Commercial supplier or fi… Basic research 71 136 0.522
## 4 Lab stock from commercial… Environmental 29 476 0.0609
## 5 Lab stock from commercial… Medical 16 190 0.0842
## 6 Lab stock from commercial… Basic research 10 136 0.0735
## 7 Lab stock from wild popul… Environmental 25 476 0.0525
## 8 Lab stock from wild popul… Medical 1 190 0.00526
## 9 Lab stock from wild popul… Basic research 7 136 0.0515
## 10 Lab stock of undisclosed … Environmental 119 476 0.25
## 11 Lab stock of undisclosed … Medical 64 190 0.337
## 12 Lab stock of undisclosed … Basic research 30 136 0.221
## 13 Wild collected Environmental 169 476 0.355
## 14 Wild collected Medical 8 190 0.0421
## 15 Wild collected Basic research 18 136 0.132
source_order <- c("Wild collected", "Lab stock from wild population", "Lab stock of undisclosed origin",
"Lab stock from commercial supplier", "Commercial supplier or fish farm")
# Define the black and grey color theme
color_theme <- c("#607C3B", "#A7D271", "#6B6E70", "#A66EAF", "#61346B")
# Calculate cumulative positions for text labels
source_summary <- source_summary %>%
dplyr::mutate(species_source = factor(species_source, levels = source_order)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::arrange(desc(species_source)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 5)
# Create the plot
source_fig <- source_summary %>%
mutate(study_motivation = fct_relevel(study_motivation, "Basic research", "Medical", "Environmental")) %>%
ggplot(aes(y = prop, x = study_motivation, fill = species_source, group = species_source)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white", size = 3) +
scale_fill_manual(values = color_theme, name = "Life stage") +
theme_classic() +
theme(legend.position = "right") +
labs(
x = "Study motivation",
y = "Proportion of all species with a described source"
) +
coord_flip()
source_fig
setwd(figures_path)
ggsave("spp_source_fig.pdf", plot = source_fig, width = 10, height = 5)
Now let’s look at those not assigned a source
source_summary_info <- source_summary_all %>%
dplyr::mutate(source_reported = if_else(species_source == "Not reported", "No", "Yes")) %>%
dplyr::group_by(source_reported) %>%
dplyr::reframe(n = sum(n_articles))
n_total <- source_summary_info %>%
dplyr::reframe(n_total = sum(n)) %>%
dplyr::pull(n_total)
source_summary_info <- source_summary_info %>%
dplyr::mutate(prop = n/n_total)
Here’s the plot
# Define the black and grey color theme
black_and_grey <- c("#BCBEC0", "#414042")
yes_order <- c("Yes", "No")
# Calculate cumulative positions for text labels
source_summary_info <- source_summary_info %>%
dplyr::mutate(source_reported = factor(source_reported, levels = yes_order)) %>%
dplyr::arrange(desc(source_reported)) %>%
dplyr::mutate(cumulative_prop = cumsum(prop) - prop / 2)
source_info_fig <- source_summary_info %>%
dplyr::mutate(source_reported = factor(source_reported, levels = yes_order)) %>%
ggplot(aes(y = prop, x = 1, fill = source_reported)) +
geom_bar(stat = "identity", width = 0.9) +
geom_text(aes(label = round(prop, 2), y = cumulative_prop), color = "white") +
scale_fill_manual(values = black_and_grey, name = "Source reported") +
theme_classic() +
theme(legend.position = "right",
axis.text.x = element_blank(), # Remove x-axis text
axis.ticks.x = element_blank() # Remove x-axis ticks
) +
labs(
x = "",
y = "Proportion of all species"
)
source_info_fig
setwd(figures_path)
ggsave("spp_source_info_fig.pdf", plot = source_info_fig, width = 5, height = 10)
There are 426 distinct compounds in the database
EIPAAB_database %>%
dplyr::distinct(compound_name) %>%
nrow()
## [1] 426
article_n <- EIPAAB_database %>%
dplyr::distinct(article_id) %>%
nrow(.)
compound_n_summary <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(compound_n) %>%
dplyr::reframe(n = n(),
percent = round((n/article_n)*100,1))
compound_n_summary
## # A tibble: 18 × 3
## compound_n n percent
## <int> <int> <dbl>
## 1 1 624 69.3
## 2 2 127 14.1
## 3 3 67 7.4
## 4 4 32 3.6
## 5 5 16 1.8
## 6 6 8 0.9
## 7 7 6 0.7
## 8 8 5 0.6
## 9 9 1 0.1
## 10 10 2 0.2
## 11 11 3 0.3
## 12 12 2 0.2
## 13 13 2 0.2
## 14 14 2 0.2
## 15 16 1 0.1
## 16 18 1 0.1
## 17 25 1 0.1
## 18 52 1 0.1
How mnay used more then 5
compound_n_summary %>%
dplyr::filter(compound_n > 5) %>%
reframe(n = sum(n),
percent = round((n/article_n)*100,1))
## # A tibble: 1 × 2
## n percent
## <int> <dbl>
## 1 35 3.9
compound_n_summary <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(compound_n, study_motivation) %>%
dplyr::summarise(n = n(), .groups = 'drop') %>%
tidyr::complete(compound_n, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(prop_motivation = n/total_motivation)
compound_n_summary
## # A tibble: 54 × 5
## compound_n study_motivation n total_motivation prop_motivation
## <int> <fct> <int> <int> <dbl>
## 1 1 Environmental 389 510 0.763
## 2 1 Medical 135 234 0.577
## 3 1 Basic research 100 157 0.637
## 4 2 Environmental 61 510 0.120
## 5 2 Medical 42 234 0.179
## 6 2 Basic research 24 157 0.153
## 7 3 Environmental 27 510 0.0529
## 8 3 Medical 26 234 0.111
## 9 3 Basic research 14 157 0.0892
## 10 4 Environmental 17 510 0.0333
## # ℹ 44 more rows
# Define the colour palette
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4") # Making colour theme to apply to plot
compound_n_oder <- c(1:9, ">10")
compound_n_fig <- compound_n_summary %>%
dplyr::mutate(
compound_n = as.character(if_else(compound_n > 10, 10, compound_n)), # Grouping cases above 10
compound_n = if_else(compound_n == "10", ">10", compound_n)
) %>%
dplyr::group_by(compound_n, study_motivation) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(compound_n = factor(compound_n, levels = compound_n_oder)) %>%
ggplot(aes(x=compound_n, y=n, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = n), vjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
labs(
x = "",
y = "Number of studies"
) +
theme()
compound_n_fig
setwd(figures_path)
ggsave("comp_compound_n_fig.pdf", plot = compound_n_fig, width = 10, height = 5)
First lets see how many compounds have an ATC classification.
305 out of 426 (71.6%)
EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::group_by(compound_atc_boolean) %>%
dplyr::reframe(n = length(compound_atc_boolean))
## # A tibble: 2 × 2
## compound_atc_boolean n
## <chr> <int>
## 1 No 121
## 2 Yes 305
Let’s seem how many classes there are at the Anatomical Therapeutic Chemical (ATC) level 1
There are 14 classes at the 1st ATC level (the highest class of the ATC). This is ever class at the first level
n_compound_atc <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::distinct(compound_name) %>%
nrow(.)
compound_ATC_L1_summary <- EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::group_by(compound_atc_level_1) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(compound_atc_level_1 = str_trim(compound_atc_level_1)) %>%
tidyr::separate_rows(compound_atc_level_1, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(compound_atc_level_1) %>%
dplyr::reframe(n = sum(n),
percent = round(n/n_compound_atc*100,1),
measure = "compounds") %>% # now a sum for each habitat
arrange(desc(n))
compound_ATC_L1_summary
## # A tibble: 14 × 4
## compound_atc_level_1 n percent measure
## <chr> <int> <dbl> <chr>
## 1 n nervous system 137 44.9 compou…
## 2 c cardiovascular system 49 16.1 compou…
## 3 a alimentary tract and metabolism 35 11.5 compou…
## 4 s sensory organs 34 11.1 compou…
## 5 g genito urinary system and sex hormones 30 9.8 compou…
## 6 j antiinfectives for systemic use 28 9.2 compou…
## 7 d dermatologicals 27 8.9 compou…
## 8 r respiratory system 26 8.5 compou…
## 9 l antineoplastic and immunomodulating agents 19 6.2 compou…
## 10 m musculo-skeletal system 12 3.9 compou…
## 11 v various 9 3 compou…
## 12 h systemic hormonal preparations, excl. sex hormones a… 8 2.6 compou…
## 13 p antiparasitic products, insecticides and repellents 6 2 compou…
## 14 b blood and blood forming organs 4 1.3 compou…
Now we will make a similar data file to look at the overall use in the database at each ATC level 1
n_data_atc <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
nrow(.)
compound_ATC_L1_data_summary <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::group_by(compound_atc_level_1) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(compound_atc_level_1 = str_trim(compound_atc_level_1)) %>%
tidyr::separate_rows(compound_atc_level_1, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(compound_atc_level_1) %>%
dplyr::reframe(n = sum(n),
percent =round(n/n_data_atc*100,1),
measure = "data") %>% # now a sum for each habitat
arrange(desc(percent))
compound_ATC_L1_data_summary
## # A tibble: 14 × 4
## compound_atc_level_1 n percent measure
## <chr> <int> <dbl> <chr>
## 1 n nervous system 1120 72.9 data
## 2 g genito urinary system and sex hormones 201 13.1 data
## 3 c cardiovascular system 158 10.3 data
## 4 d dermatologicals 130 8.5 data
## 5 s sensory organs 102 6.6 data
## 6 a alimentary tract and metabolism 93 6.1 data
## 7 l antineoplastic and immunomodulating agents 89 5.8 data
## 8 r respiratory system 76 4.9 data
## 9 m musculo-skeletal system 58 3.8 data
## 10 j antiinfectives for systemic use 57 3.7 data
## 11 v various 32 2.1 data
## 12 h systemic hormonal preparations, excl. sex hormones a… 15 1 data
## 13 b blood and blood forming organs 12 0.8 data
## 14 p antiparasitic products, insecticides and repellents 7 0.5 data
ATC_L1_summary <- compound_ATC_L1_summary %>%
rbind(., compound_ATC_L1_data_summary) %>%
dplyr::mutate(value = if_else(measure == "compounds", n, percent))
This plot shows the number of different compounds in each ATC classification as well as the total proportion of data it makes up
measure_colour_theme <- c("black", "grey") # Making colour theme to apply to plot
# Making a list of act names in the order that we want them in the plot
level_1_order <- ATC_L1_summary %>%
dplyr::filter(measure == "data") %>%
dplyr::arrange(value) %>%
dplyr::pull(compound_atc_level_1)
# Making the plot
atc_level_1_fig <- ATC_L1_summary %>%
dplyr::mutate(compound_atc_level_1 = factor(compound_atc_level_1, levels = level_1_order)) %>%
ggplot(aes(x=compound_atc_level_1, y=value, colour = measure, fill = measure, group = measure)) +
geom_col(position = position_dodge(width = 0.8), width = 0.2, colour = NA) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = value), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = measure_colour_theme, name = "Value type") +
scale_fill_manual(values = measure_colour_theme, name = "Value type") +
coord_flip() +
scale_y_continuous(
name = "Number of distinct compounds",
sec.axis = sec_axis(~ . , name = "Total proportion of the database") # Adjust scaling if needed
) +
theme_classic() +
labs(
x = "",
y = "Number of distict species in the database"
) +
theme()
atc_level_1_fig
setwd(figures_path)
ggsave("comp_atc_level_1_fig.pdf", plot = atc_level_1_fig, width = 10, height = 10)
Let’s seem how many classes there are at the Anatomical Therapeutic Chemical (ATC) level 3
There are 131 distinct classes
n_compound_atc <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::distinct(compound_name) %>%
nrow(.)
compound_ATC_L3_summary <- EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::sample_n(1) %>% # sampling one row per article per species (i.e. ignoring multiple rows per article for compounds)
dplyr::ungroup() %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(compound_atc_level_3 = str_trim(compound_atc_level_3)) %>%
tidyr::separate_rows(compound_atc_level_3, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = sum(n),
percent = round(n/n_compound_atc*100,1),
measure = "compounds") %>% # now a sum for each habitat
arrange(desc(n))
compound_ATC_L3_summary
## # A tibble: 131 × 4
## compound_atc_level_3 n percent measure
## <chr> <int> <dbl> <chr>
## 1 n06a antidepressants 27 8.9 compou…
## 2 n03a antiepileptics 18 5.9 compou…
## 3 n05a antipsychotics 14 4.6 compou…
## 4 a01a stomatological preparations 12 3.9 compou…
## 5 n05b anxiolytics 11 3.6 compou…
## 6 n05c hypnotics and sedatives 11 3.6 compou…
## 7 n06b psychostimulants, agents used for adhd and nootro… 11 3.6 compou…
## 8 r06a antihistamines for systemic use 11 3.6 compou…
## 9 c07a beta blocking agents 9 3 compou…
## 10 d04a antipruritics, incl. antihistamines, anesthetics,… 9 3 compou…
## # ℹ 121 more rows
Now we will make a similar data file to look at the overall use in the database at each ATC level 3
n_data_atc <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
nrow(.)
compound_ATC_L3_data_summary <- EIPAAB_database %>%
dplyr::filter(compound_atc_boolean == "Yes") %>%
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(compound_atc_level_3 = str_trim(compound_atc_level_3)) %>%
tidyr::separate_rows(compound_atc_level_3, sep = ";") %>% # each spp has multiple habitats the string needs splitting
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = sum(n),
percent = round(n/n_data_atc*100,1),
measure = "data") %>% # now a sum for each habitat
arrange(desc(percent))
compound_ATC_L3_data_summary
## # A tibble: 131 × 4
## compound_atc_level_3 n percent measure
## <chr> <int> <dbl> <chr>
## 1 n06a antidepressants 425 27.7 data
## 2 n03a antiepileptics 164 10.7 data
## 3 n05b anxiolytics 149 9.7 data
## 4 g03c estrogens 121 7.9 data
## 5 n06b psychostimulants, agents used for adhd and nootro… 84 5.5 data
## 6 l02a hormones and related agents 64 4.2 data
## 7 d11a other dermatological preparations 62 4 data
## 8 n05a antipsychotics 60 3.9 data
## 9 m01a antiinflammatory and antirheumatic products, non-… 50 3.3 data
## 10 n02a opioids 51 3.3 data
## # ℹ 121 more rows
Here we make a new column called value where we combined the count of distinct compounds and proportion of data
ATC_L3_summary <- compound_ATC_L3_summary %>%
rbind(., compound_ATC_L3_data_summary) %>%
dplyr::mutate(value = if_else(measure == "compounds", n, percent))
This plot shows the number of different compounds in each ATC classification as well as the total proportion of data it makes up. This is done for only the 15 most commonly used groups.
measure_colour_theme <- c("black", "grey") # Making colour theme to apply to plot
# Making a list of act names in the order that we want them in the plot
level_3_order_top_15 <- ATC_L3_summary %>%
dplyr::filter(measure == "data") %>%
dplyr::arrange(desc(value)) %>%
dplyr::slice(1:15) %>%
dplyr::arrange(desc(value)) %>%
dplyr::pull(compound_atc_level_3)
# Making the plot
atc_level_3_fig <- ATC_L3_summary %>%
dplyr::filter(compound_atc_level_3 %in% level_3_order_top_15) %>%
dplyr::mutate(compound_atc_level_3 = factor(compound_atc_level_3, levels = level_3_order_top_15)) %>%
ggplot(aes(x=compound_atc_level_3, y=value, fill = measure, colour = measure,
group = measure)) +
geom_col(position = position_dodge(width = 1), width = 0.2, colour = NA,) +
geom_point(position = position_dodge(width = 1), size = 3) +
geom_text(aes(label = value), vjust=-0.6, size=3.5, position = position_dodge(width = 1)) +
scale_colour_manual(values = measure_colour_theme, name = "Value type",
labels = c("Compounds (n)", "Percentage of data")) +
scale_fill_manual(values = measure_colour_theme, name = "Value type",
labels = c("Compounds (n)", "Percentage of data")) +
scale_y_continuous(
name = "Distinct compounds",
limits = c(0, 30), # Set the range of y-axis
breaks = c(0, 10, 20, 30), # Set the labels at 0, 10, 20, 30
sec.axis = sec_axis(~ . , name = "Percentage of database") # Adjust scaling if needed
) +
theme_classic() +
labs(
x = "",
y = ""
) +
theme(
axis.text.y = element_text(size = 8), # Change y-axis labels size
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 8), # Change x-axis labels orientation
legend.title = element_text(size = 12), # Change legend title size if needed
legend.text = element_text(size = 10)
)
atc_level_3_fig
setwd(figures_path)
ggsave("atc_level_3_fig.pdf", plot = atc_level_3_fig, width = 10, height = 5)
Overall the most common compounds are Fluoxetine, Diazepam and 17-alpha-ethinylestradiol
n_row <- EIPAAB_database %>%
nrow(.)
compound_use <- EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(prop = n/n_row) %>%
arrange(desc(prop))
compound_use
## # A tibble: 426 × 3
## compound_name n prop
## <chr> <int> <dbl>
## 1 Fluoxetine 201 0.116
## 2 Diazepam 67 0.0385
## 3 17-alpha-ethinylestradiol 63 0.0362
## 4 Caffeine 45 0.0259
## 5 Venlafaxine 43 0.0247
## 6 Citalopram 42 0.0241
## 7 Sertraline 39 0.0224
## 8 Carbamazepine 38 0.0218
## 9 Buspirone 30 0.0172
## 10 Morphine 27 0.0155
## # ℹ 416 more rows
Let’s see what the numbers are for each motivation, but let’s also maintain the overall numbers so we can add it to the figure
n_row <- EIPAAB_database %>%
nrow(.)
compound_use_motivation <- EIPAAB_database %>%
dplyr::group_by(study_motivation, compound_name) %>%
dplyr::summarise(n = n(), .groups = 'drop') %>%
tidyr::complete(compound_name, study_motivation, fill = list(n = 0)) # Making sure we have a complete dataframe
compound_use_motivation <- compound_use %>%
dplyr::select(-prop) %>%
dplyr::mutate(study_motivation = "All") %>%
rbind(., compound_use_motivation)
compound_use_motivation
## # A tibble: 1,704 × 3
## compound_name n study_motivation
## <chr> <int> <chr>
## 1 Fluoxetine 201 All
## 2 Diazepam 67 All
## 3 17-alpha-ethinylestradiol 63 All
## 4 Caffeine 45 All
## 5 Venlafaxine 43 All
## 6 Citalopram 42 All
## 7 Sertraline 39 All
## 8 Carbamazepine 38 All
## 9 Buspirone 30 All
## 10 Morphine 27 All
## # ℹ 1,694 more rows
The top 10 based on each study motivation as well as the overall total
top_10_comp_all_fig <- compound_use_motivation %>%
dplyr::filter(study_motivation == "All") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(compound_name = fct_reorder(compound_name, n)) %>%
ggplot(aes(x=compound_name, y=n)) +
geom_col(width = 0.1, colour = NA, fill = "grey") +
geom_point(size = 3, colour = "grey", fill = "grey") +
geom_text(aes(label = n), hjust=-0.6, size=3.5, color="black") +
coord_flip() +
theme_classic() +
labs(
title = "All",
x = "",
y = "Total use in the database"
) +
theme(
plot.title = element_text(size = 11)
)
top_10_comp_env_fig <- compound_use_motivation %>%
dplyr::filter(study_motivation == "Environmental") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(compound_name = fct_reorder(compound_name, n)) %>%
ggplot(aes(x=compound_name, y=n)) +
geom_col(width = 0.1, colour = NA, fill = "#60BD6C") +
geom_point(size = 3, colour = "#60BD6C", fill = "#60BD6C") +
geom_text(aes(label = n), hjust=-0.6, size=3.5, color="black") +
coord_flip() +
theme_classic() +
labs(
title = "Environmental",
x = "",
y = "Total use in the database"
) +
theme(
plot.title = element_text(size = 11)
)
top_10_comp_med_fig <- compound_use_motivation %>%
dplyr::filter(study_motivation == "Medical") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(compound_name = fct_reorder(compound_name, n)) %>%
ggplot(aes(x=compound_name, y=n)) +
geom_col(width = 0.1, colour = NA, fill = "#D359A1") +
geom_point(size = 3, colour = "#D359A1", fill = "#D359A1") +
geom_text(aes(label = n), hjust=-0.6, size=3.5, color="black") +
coord_flip() +
theme_classic() +
labs(
title = "Medical",
x = "",
y = "Total use in the database"
) +
theme(
plot.title = element_text(size = 11)
)
top_10_comp_base_fig <- compound_use_motivation %>%
dplyr::filter(study_motivation == "Basic research") %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::mutate(compound_name = fct_reorder(compound_name, n)) %>%
ggplot(aes(x=compound_name, y=n)) +
geom_col(width = 0.1, colour = NA, fill = "#3C82C4") +
geom_point(size = 3, colour = "#3C82C4", fill = "#3C82C4") +
geom_text(aes(label = n), hjust=-0.6, size=3.5, color="black") +
coord_flip() +
theme_classic() +
labs(
title = "Basic Research",
x = "",
y = "Total use in the database"
) +
theme(
plot.title = element_text(size = 11)
)
Here are the resulting figures
top_10_comp_all_fig
top_10_comp_env_fig
top_10_comp_med_fig
top_10_comp_base_fig
Saving as PDFs
setwd(figures_path)
ggsave("comp_top_10_comp_all_fig.pdf", plot = top_10_comp_all_fig, width = 5, height = 10)
ggsave("comp_top_10_comp_env_fig.pdf", plot = top_10_comp_env_fig, width = 5, height = 10)
ggsave("comp_top_10_comp_med_fig.pdf", plot = top_10_comp_med_fig, width = 5, height = 10)
ggsave("comp_top_10_comp_base_fig.pdf", plot = top_10_comp_base_fig, width = 5, height = 10)
compound_use_motivation %>%
dplyr::filter(compound_name == "17-alpha-ethinylestradiol")
## # A tibble: 4 × 3
## compound_name n study_motivation
## <chr> <int> <chr>
## 1 17-alpha-ethinylestradiol 63 All
## 2 17-alpha-ethinylestradiol 61 Environmental
## 3 17-alpha-ethinylestradiol 0 Medical
## 4 17-alpha-ethinylestradiol 2 Basic research
It was recorded whether the animals were also exposed to compound mixtures
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::reframe(mixture_yes = sum(compond_mixture == "Yes", na.rm = TRUE),
mixture_no = sum(compond_mixture == "No", na.rm = TRUE),
mixture_percent = (mixture_yes/mixture_no)*100
)
## # A tibble: 1 × 3
## mixture_yes mixture_no mixture_percent
## <int> <int> <dbl>
## 1 165 736 22.4
Medical articles have a much higher use of mixtures
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(study_motivation) %>%
dplyr::reframe(mixture_yes = sum(compond_mixture == "Yes", na.rm = TRUE),
mixture_no = sum(compond_mixture == "No", na.rm = TRUE),
mixture_percent = round((mixture_yes/mixture_no)*100,1)
)
## # A tibble: 3 × 4
## study_motivation mixture_yes mixture_no mixture_percent
## <fct> <int> <int> <dbl>
## 1 Environmental 57 453 12.6
## 2 Medical 76 158 48.1
## 3 Basic research 32 125 25.6
Data on the method of exposure was also extracted
nrow <- EIPAAB_database %>%
nrow(.)
EIPAAB_database %>%
dplyr::group_by(compound_expose_route) %>%
dplyr::reframe(n = n(),
percent = round((n/nrow)*100,1)
)
## # A tibble: 3 × 3
## compound_expose_route n percent
## <chr> <int> <dbl>
## 1 Other exposure route 223 12.8
## 2 Waterborne only 1501 86.3
## 3 Waterborne plus any other route 16 0.9
The database has both the minimum and maximum duration of exposure prior to behavioural measure (compound_min_duration_exposure and compound_max_duration_exposure). Here we will focus on the maximum duration
These are the different categories of exposure length
EIPAAB_database %>%
dplyr::distinct(compound_max_duration_exposure)
## compound_max_duration_exposure
## 1 Less than 6 hours
## 2 1 to 3 months
## 3 3 to 8 days
## 4 22 to 29 days
## 5 Multigenerational
## 6 6 to 24 hours
## 7 1 to 3 days
## 8 8 to 15 days
## 9 Not stated
## 10 15 to 22 days
## 11 Transgenerational
## 12 3 to 6 months
## 13 Lifetime
Some articles did not report the exposure duration at all, or in sufficient detail to extract.
In total this occurred in 108 cases
EIPAAB_database %>%
dplyr::filter(compound_min_duration_exposure == "Not stated" | compound_max_duration_exposure == "Not stated") %>%
nrow(.)
## [1] 108
exposure_duration_order <- c("Less than 6 hours", "6 to 24 hours", "1 to 3 days", "3 to 8 days", "8 to 15 days", "15 to 22 days", "22 to 29 days", "1 to 3 months", "3 to 6 months", "Lifetime", "Transgenerational", "Multigenerational")
nrow <- EIPAAB_database %>%
dplyr::filter(compound_max_duration_exposure != "Not stated") %>%
nrow(.)
exposure_duration_summary <- EIPAAB_database %>%
dplyr::filter(compound_max_duration_exposure != "Not stated") %>%
dplyr::group_by(compound_max_duration_exposure) %>%
dplyr::reframe(n = n(),
percent = round((n/nrow)*100,1)
) %>%
dplyr::mutate(compound_max_duration_exposure = factor(compound_max_duration_exposure,
levels = exposure_duration_order)) %>%
dplyr::arrange(compound_max_duration_exposure) %>%
dplyr::mutate(study_motivation = "All")
exposure_duration_summary
## # A tibble: 12 × 4
## compound_max_duration_exposure n percent study_motivation
## <fct> <int> <dbl> <chr>
## 1 Less than 6 hours 679 41.3 All
## 2 6 to 24 hours 129 7.8 All
## 3 1 to 3 days 106 6.4 All
## 4 3 to 8 days 318 19.3 All
## 5 8 to 15 days 101 6.1 All
## 6 15 to 22 days 113 6.9 All
## 7 22 to 29 days 59 3.6 All
## 8 1 to 3 months 83 5 All
## 9 3 to 6 months 23 1.4 All
## 10 Lifetime 9 0.5 All
## 11 Transgenerational 15 0.9 All
## 12 Multigenerational 9 0.5 All
exposure_duration_order <- c("Less than 6 hours", "6 to 24 hours", "1 to 3 days", "3 to 8 days", "8 to 15 days", "15 to 22 days", "22 to 29 days", "1 to 3 months", "3 to 6 months", "Lifetime", "Transgenerational", "Multigenerational")
exposure_duration_motivation_summary <- EIPAAB_database %>%
dplyr::filter(compound_max_duration_exposure != "Not stated") %>%
dplyr::group_by(compound_max_duration_exposure, study_motivation) %>%
dplyr::summarise(n = n(), .groups = 'drop') %>%
tidyr::complete(compound_max_duration_exposure, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round((n/total_motivation)*100,1)) %>%
dplyr::select(-total_motivation)
exp_duration_motivation_summary <- exposure_duration_motivation_summary %>%
rbind(exposure_duration_summary) %>%
dplyr::mutate(compound_max_duration_exposure = factor(compound_max_duration_exposure,
levels = rev(exposure_duration_order))
) %>%
dplyr::arrange(desc(compound_max_duration_exposure))
exp_duration_motivation_summary
## # A tibble: 48 × 4
## compound_max_duration_exposure study_motivation n percent
## <fct> <fct> <int> <dbl>
## 1 Less than 6 hours Environmental 124 15
## 2 Less than 6 hours Medical 285 61.2
## 3 Less than 6 hours Basic research 270 76.7
## 4 Less than 6 hours All 679 41.3
## 5 6 to 24 hours Environmental 62 7.5
## 6 6 to 24 hours Medical 53 11.4
## 7 6 to 24 hours Basic research 14 4
## 8 6 to 24 hours All 129 7.8
## 9 1 to 3 days Environmental 74 9
## 10 1 to 3 days Medical 22 4.7
## # ℹ 38 more rows
exp_duration_all_fig <- exp_duration_motivation_summary %>%
dplyr::filter(study_motivation == "All") %>%
ggplot(aes(x=compound_max_duration_exposure, y=percent)) +
geom_col(width = 0.1, colour = NA, fill = "grey") +
geom_point(size = 3, colour = "grey", fill = "grey") +
geom_text(aes(label = percent), vjust=-0.6, size=3.5, color="black") +
theme_classic() +
coord_flip() +
labs(
title = "All",
x = "",
y = "Total percentage"
) +
theme(
plot.title = element_text(size = 11)
)
exp_duration_env_fig <- exp_duration_motivation_summary %>%
dplyr::filter(study_motivation == "Environmental") %>%
ggplot(aes(x=compound_max_duration_exposure, y=percent)) +
geom_col(width = 0.1, colour = NA, fill = "#60BD6C") +
geom_point(size = 3, colour = "#60BD6C", fill = "#60BD6C") +
geom_text(aes(label = percent), vjust=-0.6, size=3.5, color="black") +
theme_classic() +
coord_flip() +
labs(
title = "Environmental",
x = "",
y = "Total percentage"
) +
theme(
plot.title = element_text(size = 11)
)
exp_duration_med_fig <- exp_duration_motivation_summary %>%
dplyr::filter(study_motivation == "Medical") %>%
ggplot(aes(x=compound_max_duration_exposure, y=percent)) +
geom_col(width = 0.1, colour = NA, fill = "#D359A1") +
geom_point(size = 3, colour = "#D359A1", fill = "#D359A1") +
geom_text(aes(label = percent), vjust=-0.6, size=3.5, color="black") +
theme_classic() +
coord_flip() +
labs(
title = "Medical",
x = "",
y = "Total percentage"
) +
theme(
plot.title = element_text(size = 11)
)
exp_duration_base_fig <- exp_duration_motivation_summary %>%
dplyr::filter(study_motivation == "Basic research") %>%
ggplot(aes(x=compound_max_duration_exposure, y=percent)) +
geom_col(width = 0.1, colour = NA, fill = "#3C82C4") +
geom_point(size = 3, colour = "#3C82C4", fill = "#3C82C4") +
geom_text(aes(label = percent), vjust=-0.6, size=3.5, color="black") +
theme_classic() +
coord_flip() +
labs(
title = "Basic research",
x = "",
y = "Total percentage"
) +
theme(
plot.title = element_text(size = 11)
)
exp_duration_all_fig
exp_duration_env_fig
exp_duration_med_fig
exp_duration_base_fig
setwd(figures_path)
ggsave("comp_exp_duration_all_fig.pdf", plot = exp_duration_all_fig, width = 8.3/3, height = 11.7/3)
ggsave("comp_exp_duration_env_fig.pdf", plot = exp_duration_env_fig, width = 8.3/3, height = 11.7/3)
ggsave("comp_exp_duration_med_fig.pdf", plot = exp_duration_med_fig, width = 8.3/3, height = 11.7/3)
ggsave("comp_exp_duration_base_fig.pdf", plot = exp_duration_base_fig, width = 8.3/3, height = 11.7/3)
Here I will look at the number of doses used. This was meassured as the total treatments (i.e. inculding control), so if we want to know the number of doses for the compound we need to subtract 1. I have done this below.
n_doses_summary <- EIPAAB_database %>%
dplyr::filter(!is.na(compound_treatment_levels)) %>%
dplyr::mutate(n_doses = compound_treatment_levels-1) %>%
dplyr::group_by(n_doses) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(total = sum(n),
percent = round((n/total)*100,1),
study_motivation = "All")
n_doses_summary
## # A tibble: 14 × 5
## n_doses n total percent study_motivation
## <dbl> <int> <int> <dbl> <chr>
## 1 1 450 1514 29.7 All
## 2 2 203 1514 13.4 All
## 3 3 377 1514 24.9 All
## 4 4 172 1514 11.4 All
## 5 5 163 1514 10.8 All
## 6 6 46 1514 3 All
## 7 7 50 1514 3.3 All
## 8 8 14 1514 0.9 All
## 9 9 11 1514 0.7 All
## 10 10 10 1514 0.7 All
## 11 11 10 1514 0.7 All
## 12 12 5 1514 0.3 All
## 13 13 2 1514 0.1 All
## 14 17 1 1514 0.1 All
Let’s see how many use more then 5
n_doses_summary %>%
dplyr::filter(n_doses > 5) %>%
dplyr::summarise(over_5_percent = sum(percent))
## # A tibble: 1 × 1
## over_5_percent
## <dbl>
## 1 9.8
Looking by study motivation
n_doses_motivation_summary <- EIPAAB_database %>%
dplyr::filter(!is.na(compound_treatment_levels)) %>%
dplyr::mutate(n_doses = compound_treatment_levels-1) %>%
dplyr::group_by(n_doses, study_motivation) %>%
dplyr::summarise(n = n(), .groups = 'drop') %>%
tidyr::complete(n_doses, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round((n/total)*100,1))
n_doses_motivation_summary
## # A tibble: 42 × 5
## n_doses study_motivation n total percent
## <dbl> <fct> <int> <int> <dbl>
## 1 1 Environmental 175 824 21.2
## 2 1 Medical 173 397 43.6
## 3 1 Basic research 102 293 34.8
## 4 2 Environmental 142 824 17.2
## 5 2 Medical 31 397 7.8
## 6 2 Basic research 30 293 10.2
## 7 3 Environmental 184 824 22.3
## 8 3 Medical 88 397 22.2
## 9 3 Basic research 105 293 35.8
## 10 4 Environmental 112 824 13.6
## # ℹ 32 more rows
Making a plot
dose_order <- c(1:12, ">12")
doses_fig <- n_doses_motivation_summary %>%
dplyr::mutate(n_doses = as.character(if_else(n_doses>13, 13, n_doses)),
n_doses = if_else(n_doses == "13", ">12", n_doses),
n_doses = factor(n_doses, levels = dose_order)
)%>%
ggplot(aes(x=n_doses, y=percent, colour = study_motivation,
fill = study_motivation, group = study_motivation)) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_line(size = 1) +
geom_text(aes(label = percent), vjust=-0.6, size=3.5, color="black") +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
facet_wrap(~study_motivation) +
theme(
legend.position = c(0.8, 0.9), # Positioning the legend in the top-left corner within the plot
legend.justification = c(0, 1) # Ensuring the legend box aligns properly at the top-left corner
) +
labs(
x = "",
y = "Number of studies"
) +
theme()
doses_fig
setwd(figures_path)
ggsave("comp_doses_fig.pdf", plot = doses_fig, width = 8.3, height = 11.7/3)
## 9.8 Concentrations
Here I will have a look at the min and max and range of doses used in the database. For the MS, I am including only studies that reported in a mass to water volume measure so we can compare standardised unites (ug/L). This was the most common reporting methods (62% of all data; 1090 total).
nrow <- EIPAAB_database %>%
nrow()
EIPAAB_database %>%
dplyr::group_by(compound_min_dose_unit_std) %>%
dplyr::reframe(n = n(),
prop = n/nrow) %>%
dplyr::arrange(desc(n))
## # A tibble: 7 × 3
## compound_min_dose_unit_std n prop
## <chr> <int> <dbl>
## 1 ug/L 1076 0.618
## 2 uM 397 0.228
## 3 <NA> 229 0.132
## 4 uM/L 25 0.0144
## 5 ppm 9 0.00517
## 6 uL/L 2 0.00115
## 7 ug/g 2 0.00115
Summary of the minimum concentration used (where reported in mass to volume)
EIPAAB_database <- EIPAAB_database %>%
dplyr::mutate(range = compound_max_dose_std - compound_min_dose_std)
EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L", compound_max_dose_unit_std == "ug/L") %>%
dplyr::mutate(range = compound_max_dose_std - compound_min_dose_std) %>%
dplyr::group_by(study_motivation) %>%
dplyr::reframe(median_min = median(compound_min_dose_std, na.rm = T),
sd_min = sd(compound_min_dose_std, na.rm = T),
min_min = min(compound_min_dose_std, na.rm = T),
max_min = max(compound_min_dose_std, na.rm = T),
median_max = median(compound_max_dose_std, na.rm = T),
sd_max = sd(compound_max_dose_std, na.rm = T),
min_max = min(compound_max_dose_std, na.rm = T),
max_max = max(compound_max_dose_std, na.rm = T),
median_range = median(range, na.rm = T),
sd_range = sd(range, na.rm = T),
min_range = min(range, na.rm = T),
max_range = max(range, na.rm = T))
## # A tibble: 3 × 13
## study_motivation median_min sd_min min_min max_min median_max sd_max min_max
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Environmental 0.995 25256. 3.13e-6 5 e5 100 1.41e5 0.001
## 2 Medical 1000 74501. 5 e-2 5.40e5 5000 2.07e5 0.05
## 3 Basic research 3000 5301683. 1 e-2 6 e7 10000 5.30e6 0.01
## # ℹ 5 more variables: max_max <dbl>, median_range <dbl>, sd_range <dbl>,
## # min_range <dbl>, max_range <dbl>
A plot for minimum doses, its on the log axis because the distribution is highly skewed motivation_colour_theme
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4")
min_conc_fig <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
ggplot(aes(x=log(compound_min_dose_std), fill = study_motivation, colour = study_motivation)) +
stat_slab(alpha = 0.6, linewidth = 1.5, colour = NA) +
stat_pointinterval(point_interval = "median_qi",
position = position_dodge(width = .4, preserve = "single"),
.width = c(0.89, 0.95)) +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation", guide = 'none') +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
labs(
x = "Log10 minimum dose (ug/L)",
y = "Density"
) +
theme(legend.position="bottom")
min_conc_fig
setwd(figures_path)
ggsave("comp_min_conc_fig.pdf", plot = min_conc_fig, width = 5, height = 6)
A summary table so we can see what the corresponding raw values are in the plot
min_conc_summary <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
dplyr::group_by(study_motivation) %>%
dplyr::summarise(
median = median(log(compound_min_dose_std), na.rm = TRUE),
lower_89 = quantile(log(compound_min_dose_std), probs = 0.11, na.rm = TRUE),
upper_89 = quantile(log(compound_min_dose_std), probs = 0.89, na.rm = TRUE),
lower_95 = quantile(log(compound_min_dose_std), probs = 0.05, na.rm = TRUE),
upper_95 = quantile(log(compound_min_dose_std), probs = 0.95, na.rm = TRUE),
.groups = "drop"
) %>%
# Transform to a format suitable for ggplot annotation
pivot_longer(cols = -study_motivation, names_to = "stat", values_to = "value") %>%
dplyr::mutate(vaule_raw = format(exp(value), scientific = FALSE))
min_conc_summary
## # A tibble: 15 × 4
## study_motivation stat value vaule_raw
## <fct> <chr> <dbl> <chr>
## 1 Environmental median -0.00503 " 0.994987437"
## 2 Environmental lower_89 -5.41 " 0.004456651"
## 3 Environmental upper_89 5.54 " 254.538265740"
## 4 Environmental lower_95 -6.91 " 0.001000000"
## 5 Environmental upper_95 8.35 " 4240.690713691"
## 6 Medical median 6.91 " 1000.000000000"
## 7 Medical lower_89 3.10 " 22.211490447"
## 8 Medical upper_89 10.3 " 30000.000000000"
## 9 Medical lower_95 0 " 1.000000000"
## 10 Medical upper_95 11.5 "100000.000000000"
## 11 Basic research median 8.01 " 3000.000000000"
## 12 Basic research lower_89 -0.920 " 0.398400829"
## 13 Basic research upper_89 10.8 " 50802.311370061"
## 14 Basic research lower_95 -2.30 " 0.100000000"
## 15 Basic research upper_95 11.5 "100000.000000000"
A plot for maximum doses, its on the log axis because the distribution is highly skewed motivation_colour_theme
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4")
max_conc_fig <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
ggplot(aes(x=log(compound_max_dose_std), fill = study_motivation, colour = study_motivation)) +
stat_slab(alpha = 0.6, linewidth = 1.5, colour = NA) +
stat_pointinterval(point_interval = "median_qi",
position = position_dodge(width = .4, preserve = "single"),
.width = c(0.89, 0.95)) +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation", guide = 'none') +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
labs(
x = "Log10 maximum dose (ug/L)",
y = "Density"
) +
theme(legend.position="bottom")
max_conc_fig
setwd(figures_path)
ggsave("comp_max_conc_fig.pdf", plot = max_conc_fig, width = 5, height = 6)
A summary table so we can see what the corresponding raw values are in the plot
max_conc_summary <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
dplyr::group_by(study_motivation) %>%
dplyr::summarise(
median = median(log(compound_max_dose_std), na.rm = TRUE),
lower_89 = quantile(log(compound_max_dose_std), probs = 0.11, na.rm = TRUE),
upper_89 = quantile(log(compound_max_dose_std), probs = 0.89, na.rm = TRUE),
lower_95 = quantile(log(compound_max_dose_std), probs = 0.05, na.rm = TRUE),
upper_95 = quantile(log(compound_max_dose_std), probs = 0.95, na.rm = TRUE),
.groups = "drop"
) %>%
# Transform to a format suitable for ggplot annotation
pivot_longer(cols = -study_motivation, names_to = "stat", values_to = "value") %>%
dplyr::mutate(vaule_raw = format(exp(value), scientific = FALSE))
max_conc_summary
## # A tibble: 15 × 4
## study_motivation stat value vaule_raw
## <fct> <chr> <dbl> <chr>
## 1 Environmental median 4.61 " 100.000000000"
## 2 Environmental lower_89 -3.00 " 0.049652545"
## 3 Environmental upper_89 10.1 " 23247.868374564"
## 4 Environmental lower_95 -4.62 " 0.009873911"
## 5 Environmental upper_95 11.5 "100000.000000000"
## 6 Medical median 8.52 " 5000.000000000"
## 7 Medical lower_89 4.42 " 82.775528294"
## 8 Medical upper_89 11.5 "100000.000000000"
## 9 Medical lower_95 2.98 " 19.686403036"
## 10 Medical upper_95 12.9 "408740.139635277"
## 11 Basic research median 9.21 " 10000.000000000"
## 12 Basic research lower_89 0.895 " 2.446887175"
## 13 Basic research upper_89 11.5 "100000.000000000"
## 14 Basic research lower_95 -2.30 " 0.100000000"
## 15 Basic research upper_95 12.6 "300000.000000000"
A plot for the range of doses, its on the log axis because the distribution is highly skewed. This includes only studies that had more then one dose and reported concentration in a mass to volume metric.
motivation_colour_theme <- c("#60BD6C", "#D359A1", "#3C82C4")
range_conc_fig <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
dplyr::filter(range > 0) %>%
ggplot(aes(x=log(range), fill = study_motivation, colour = study_motivation)) +
stat_slab(alpha = 0.6, linewidth = 1.5, colour = NA) +
stat_pointinterval(point_interval = "median_qi",
position = position_dodge(width = .4, preserve = "single"),
.width = c(0.89, 0.95)) +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation", guide = 'none') +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
labs(
x = "Log10 range (ug/L)",
y = "Density"
) +
theme(legend.position="bottom")
range_conc_fig
setwd(figures_path)
ggsave("comp_range_conc_fig.pdf", plot = range_conc_fig, width = 5, height = 6)
A summary table so we can see what the corresponding raw values are in the plot
range_conc_summary <- EIPAAB_database %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L",
range > 0) %>%
dplyr::group_by(study_motivation) %>%
summarise(
median = median(log(range), na.rm = TRUE),
lower_89 = quantile(log(range), probs = 0.055, na.rm = TRUE),
upper_89 = quantile(log(range), probs = 0.945, na.rm = TRUE),
lower_95 = quantile(log(range), probs = 0.025, na.rm = TRUE),
upper_95 = quantile(log(range), probs = 0.975, na.rm = TRUE)
) %>%
# Transform to a format suitable for ggplot annotation
pivot_longer(cols = -study_motivation, names_to = "stat", values_to = "value") %>%
dplyr::mutate(vaule_raw = format(exp(value), scientific = FALSE))
range_conc_summary
## # A tibble: 15 × 4
## study_motivation stat value vaule_raw
## <fct> <chr> <dbl> <chr>
## 1 Environmental median 5.38 " 217.80000000"
## 2 Environmental lower_89 -3.00 " 0.05000498"
## 3 Environmental upper_89 11.5 " 99995.93990911"
## 4 Environmental lower_95 -4.07 " 0.01704190"
## 5 Environmental upper_95 13.1 "491792.33880353"
## 6 Medical median 9.10 " 9000.00000000"
## 7 Medical lower_89 4.33 " 75.72276785"
## 8 Medical upper_89 13.2 "539495.65697997"
## 9 Medical lower_95 3.42 " 30.45013206"
## 10 Medical upper_95 13.6 "846549.05789480"
## 11 Basic research median 9.85 " 19000.00000000"
## 12 Basic research lower_89 0.0872 " 1.09108007"
## 13 Basic research upper_89 12.5 "270000.00000000"
## 14 Basic research lower_95 -0.0102 " 0.98987944"
## 15 Basic research upper_95 12.6 "299042.91267704"
env_min_conc_fig <- EIPAAB_database %>%
dplyr::filter(study_motivation == "Environmental") %>%
dplyr::filter(compound_min_dose_unit_std == "ug/L") %>%
ggplot(aes(x=log(compound_min_dose_std))) +
stat_slab(aes(alpha = 0.8, linewidth = 1.5)) +
stat_pointinterval(point_interval = "median_qi",
position = position_dodge(width = .4, preserve = "single"),
.width = c(0.89, 0.95)) +
theme_classic() +
theme(legend.position="none")
env_min_conc_fig
A summary table so we can see what the corresponding raw values are in
the plot
env_conc_summary <- EIPAAB_database %>%
filter(study_motivation == "Environmental",
compound_min_dose_unit_std == "ug/L") %>%
summarise(
median = median(log(compound_min_dose_std), na.rm = TRUE),
lower_89 = quantile(log(compound_min_dose_std), probs = 0.055, na.rm = TRUE),
upper_89 = quantile(log(compound_min_dose_std), probs = 0.945, na.rm = TRUE),
lower_95 = quantile(log(compound_min_dose_std), probs = 0.025, na.rm = TRUE),
upper_95 = quantile(log(compound_min_dose_std), probs = 0.975, na.rm = TRUE)
) %>%
# Transform to a format suitable for ggplot annotation
pivot_longer(cols = everything(), names_to = "stat", values_to = "value") %>%
dplyr::mutate(vaule_raw = format(exp(value), scientific = FALSE))
env_conc_summary
## # A tibble: 5 × 3
## stat value vaule_raw
## <chr> <dbl> <chr>
## 1 median -0.00503 " 0.9949874371"
## 2 lower_89 -6.91 " 0.0010000000"
## 3 upper_89 7.79 " 2425.8399279662"
## 4 lower_95 -7.49 " 0.0005588055"
## 5 upper_95 9.67 "15901.5600424659"
Where the exposure itself was conducted
EIPAAB_database %>%
dplyr::group_by(compound_exposure_location) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(total = sum(n),
perecent = round(n/total*100,1))
## # A tibble: 3 × 4
## compound_exposure_location n total perecent
## <chr> <int> <int> <dbl>
## 1 Indoor laboratory setting or assumed indoors 1730 1740 99.4
## 2 Outdoor natural setting 4 1740 0.2
## 3 Outdoor restricted setting (cannot interact with wild sp… 6 1740 0.3
First I will make a new variable called beahv_catgory_n, which will look at how many of our 10 broad behavioural categories were measured in the article.
The 10 over-arching categories were: (1) movement and locomotion, (2) pre-mating and mating behaviour, (3) post-mating behaviour, (4) aggression, (5) sociality, (6) cognition and learning, (7) anxiety and boldness, (8) foraging and feeding, (9) antipredator behaviour, and (10) other behaviours not categorised
This will take the some of all the behaviour categories., so can range from 1 to 10 for a single behavioural category to all categories.
EIPAAB_database <- EIPAAB_database %>%
dplyr::mutate(behav_category_n = rowSums(across(starts_with("behav_") & ends_with("_boolean"))))
The majority of evidence seems to be based on a single behavioural category
EIPAAB_database %>%
dplyr::group_by(behav_category_n) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(percent = round(n/sum(n)*100,1))
## # A tibble: 6 × 3
## behav_category_n n percent
## <dbl> <int> <dbl>
## 1 1 1206 69.3
## 2 2 400 23
## 3 3 115 6.6
## 4 4 16 0.9
## 5 5 2 0.1
## 6 7 1 0.1
Is this the same by study motivation
EIPAAB_database %>%
dplyr::group_by(behav_category_n, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1))
## # A tibble: 14 × 5
## behav_category_n study_motivation n total_motivation percent
## <dbl> <fct> <int> <int> <dbl>
## 1 1 Environmental 593 858 69.1
## 2 1 Medical 379 519 73
## 3 1 Basic research 234 363 64.5
## 4 2 Environmental 180 858 21
## 5 2 Medical 110 519 21.2
## 6 2 Basic research 110 363 30.3
## 7 3 Environmental 75 858 8.7
## 8 3 Medical 25 519 4.8
## 9 3 Basic research 15 363 4.1
## 10 4 Environmental 8 858 0.9
## 11 4 Medical 4 519 0.8
## 12 4 Basic research 4 363 1.1
## 13 5 Environmental 2 858 0.2
## 14 7 Medical 1 519 0.2
Here I make a dataframe where I have pivoted the data to long formate based on each of the 10 behaviour categories. This dataframe can be used to ask more spefic questiosn about the relationship between species, compound, and behaviour.
But first let’s use it to see what behaviours are most common overall all, and within each study motivation.
binary_behav <- EIPAAB_database %>%
dplyr::select((starts_with("behav_") & ends_with("_boolean"))) %>%
colnames()
PICO_long <- EIPAAB_database %>%
tidyr::pivot_longer(.,
cols = all_of(binary_behav),
names_to = "behav_category",
values_to = "value") %>%
dplyr::select(article_id, study_motivation, species_name, species_class,
compound_name, compound_atc_level_3, behav_category, value) %>%
dplyr::mutate(behav_category = behav_category %>% str_remove("behav_") %>% str_remove("_boolean"))
PICO_long
## # A tibble: 17,400 × 8
## article_id study_motivation species_name species_class compound_name
## <chr> <fct> <chr> <chr> <chr>
## 1 236660465 Environmental Danio rerio Actinopterygii Buspirone
## 2 236660465 Environmental Danio rerio Actinopterygii Buspirone
## 3 236660465 Environmental Danio rerio Actinopterygii Buspirone
## 4 236660465 Environmental Danio rerio Actinopterygii Buspirone
## 5 236660465 Environmental Danio rerio Actinopterygii Buspirone
## 6 236660465 Environmental Danio rerio Actinopterygii Buspirone
## 7 236660465 Environmental Danio rerio Actinopterygii Buspirone
## 8 236660465 Environmental Danio rerio Actinopterygii Buspirone
## 9 236660465 Environmental Danio rerio Actinopterygii Buspirone
## 10 236660465 Environmental Danio rerio Actinopterygii Buspirone
## # ℹ 17,390 more rows
## # ℹ 3 more variables: compound_atc_level_3 <chr>, behav_category <chr>,
## # value <int>
behav_overall <- PICO_long %>%
dplyr::group_by(behav_category) %>%
reframe(n = sum(value)) %>%
dplyr::mutate(percent = round(n/sum(n)*100, 1)) %>%
dplyr::arrange(desc(n))
behav_overall
## # A tibble: 10 × 3
## behav_category n percent
## <chr> <int> <dbl>
## 1 movement 983 40.4
## 2 boldness 568 23.4
## 3 foraging 190 7.8
## 4 agression 145 6
## 5 sociality 143 5.9
## 6 mating 122 5
## 7 noncat 96 3.9
## 8 cognition 90 3.7
## 9 antipredator 85 3.5
## 10 post_mating 10 0.4
behav_motivation <- PICO_long %>%
dplyr::group_by(behav_category, study_motivation) %>%
dplyr::summarise(n = sum(value), .groups = 'drop') %>%
tidyr::complete(behav_category, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation) %>%
dplyr::arrange(desc(study_motivation))
behav_overall <- behav_motivation %>%
dplyr::group_by(behav_category) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(percent = round(n/sum(n)*100,1),
study_motivation = "Overall")
behav_motivation <- rbind(behav_overall, behav_motivation)
behav_motivation
## # A tibble: 40 × 4
## behav_category n percent study_motivation
## <chr> <int> <dbl> <chr>
## 1 agression 145 6 Overall
## 2 antipredator 85 3.5 Overall
## 3 boldness 568 23.4 Overall
## 4 cognition 90 3.7 Overall
## 5 foraging 190 7.8 Overall
## 6 mating 122 5 Overall
## 7 movement 983 40.4 Overall
## 8 noncat 96 3.9 Overall
## 9 post_mating 10 0.4 Overall
## 10 sociality 143 5.9 Overall
## # ℹ 30 more rows
motivation_colour_theme <- c("grey", "#60BD6C", "#D359A1", "#3C82C4")
behav_order <- c("movement", "boldness", "foraging", "antipredator", "mating", "post_mating", "agression", "sociality", "cognition", "noncat")
study_motivation_order <- c("Overall", "Environmental", "Medical", "Basic research")
behav_motivation_fig <- behav_motivation %>%
dplyr::mutate(behav_category = factor(behav_category, levels = rev(behav_order)),
study_motivation = factor(study_motivation, levels = study_motivation_order)) %>%
ggplot(aes(x=behav_category, y=percent, colour = study_motivation,
fill = study_motivation), group = study_motivation) +
geom_col(width = 0.1, colour = NA) +
geom_point(size = 3) +
geom_text(aes(label = percent), vjust = -0.3, size=3.5, color="black") +
theme_classic() +
facet_grid(cols = vars(study_motivation)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
coord_flip() +
labs(
x = "",
y = "Percentage"
) +
theme(
plot.title = element_text(size = 11)
)
behav_motivation_fig
setwd(figures_path)
ggsave("behav_motivation_fig.pdf", plot = behav_motivation_fig, width = 10, height = 5)
behav_select <- EIPAAB_database %>%
dplyr::select((starts_with("behav_") & !ends_with("_boolean") & !ends_with("is_social_context") & !ends_with("test_location") & !ends_with("category_n"))) %>%
colnames()
behav_sub_cat_long <- EIPAAB_database %>%
tidyr::pivot_longer(.,
cols = all_of(behav_select),
names_to = "parent_category",
values_to = "sub_category") %>%
dplyr::select(article_id, study_motivation, species_name, species_class,
compound_name, compound_atc_level_3, parent_category, sub_category) %>%
dplyr::mutate(parent_category = parent_category %>% str_remove("behav_")) %>%
tidyr::separate_rows(sub_category, sep = ";") %>%
dplyr::filter(!is.na(sub_category))
behav_sub_cat_summary <- behav_sub_cat_long %>%
dplyr::group_by(study_motivation, parent_category, sub_category) %>%
dplyr::summarise(n_sub_cat = n(), .groups = 'drop') %>%
dplyr::group_by(study_motivation, parent_category) %>%
dplyr::mutate(n_parent = sum(n_sub_cat)) %>%
dplyr::ungroup() %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(n_motivation = sum(n_sub_cat)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent_sub_cat = n_sub_cat/n_parent,
percent_parent = n_parent/n_motivation)
behav_sub_cat_summary
## # A tibble: 155 × 8
## study_motivation parent_category sub_category n_sub_cat n_parent n_motivation
## <fct> <chr> <chr> <int> <int> <int>
## 1 Environmental agression aggression … 27 70 1511
## 2 Environmental agression aggression … 10 70 1511
## 3 Environmental agression aggression … 20 70 1511
## 4 Environmental agression aggression … 7 70 1511
## 5 Environmental agression locomotor a… 6 70 1511
## 6 Environmental antipredator locomotor a… 22 100 1511
## 7 Environmental antipredator response to… 10 100 1511
## 8 Environmental antipredator response to… 6 100 1511
## 9 Environmental antipredator response to… 20 100 1511
## 10 Environmental antipredator response to… 42 100 1511
## # ℹ 145 more rows
## # ℹ 2 more variables: percent_sub_cat <dbl>, percent_parent <dbl>
ring_plot_subcat <- behav_sub_cat_summary %>%
dplyr::group_by(study_motivation, parent_category) %>%
dplyr::mutate(ymax = cumsum(percent_sub_cat),
ymin = lag(ymax,1),
ymin = if_else(is.na(ymin), 0, ymin),
labelPosition = (ymax+ymin)/2,
label = paste0(sub_category, "\n (n = ", n_sub_cat, ")")) %>%
dplyr::ungroup()
First making a complete dataset (adding zeros for missing sub-categories. in each motivation), and ordering by overall prevalence of sub-categories.
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "movement") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
movement_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "movement") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0, n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_movement_subcat_fig <- movement_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat,3)*100) %>%
ggplot(aes(x=sub_category, y=percent_sub_cat, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
coord_flip() +
labs(
x = "",
y = "Percentage of data"
) +
theme(
legend.position = "none"
)
beh_movement_subcat_fig
n_subcat <- movement_subcat %>%
dplyr::distinct(sub_category) %>%
nrow(.)/10
setwd(figures_path)
ggsave("beh_movement_subcat_fig.pdf", plot = beh_movement_subcat_fig, width = 10, height = 11.7*n_subcat)
If you would like to make a doughnut chart here’s the code. However, for categories that have 5 or more sub-categories like movement I don’t think this is the clearest way to present the data.
movement_subcat %>%
dplyr::arrange(sub_category) %>%
dplyr::arrange(study_motivation) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(ymax = cumsum(percent_sub_cat),
ymin = lag(ymax,1),
ymin = if_else(is.na(ymin), 0, ymin),
labelPosition = (ymax+ymin)/2,
label = if_else(n_sub_cat == 0, NA, n_sub_cat)) %>%
dplyr::ungroup() %>%
ggplot(aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=sub_category)) +
geom_rect() +
coord_polar(theta="y") +
geom_label(x=4, aes(y=labelPosition, label=label), size=3, alpha = 0.8) +
facet_wrap(~study_motivation) +
xlim(c(2, 5)) +
theme_void() +
theme(legend.position = "bottom")
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "boldness") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
boldness_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "boldness") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0, n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_boldness_subcat_fig <- boldness_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat,3)*100) %>%
ggplot(aes(x=sub_category, y=percent_sub_cat, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
coord_flip() +
labs(
x = "",
y = "Percentage of data"
) +
theme(
legend.position = "none"
)
beh_boldness_subcat_fig
n_subcat <- boldness_subcat %>%
dplyr::distinct(sub_category) %>%
nrow(.)/10
setwd(figures_path)
ggsave("beh_boldness_subcat_fig.pdf", plot = beh_boldness_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "foraging") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
foraging_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "foraging") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0, n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_foraging_subcat_fig <- foraging_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat,3)*100) %>%
ggplot(aes(x=sub_category, y=percent_sub_cat, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
coord_flip() +
labs(
x = "",
y = "Percentage of data"
) +
theme(
legend.position = "none"
)
beh_foraging_subcat_fig
n_subcat <- foraging_subcat %>%
dplyr::distinct(sub_category) %>%
nrow(.)/10
setwd(figures_path)
ggsave("beh_foraging_subcat_fig.pdf", plot = beh_foraging_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "antipredator") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
antipredator_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "antipredator") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0, n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_antipredator_subcat_fig <- antipredator_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat,3)*100) %>%
ggplot(aes(x=sub_category, y=percent_sub_cat, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
coord_flip() +
labs(
x = "",
y = "Percentage of data"
) +
theme(
legend.position = "none"
)
beh_antipredator_subcat_fig
n_subcat <- antipredator_subcat %>%
dplyr::distinct(sub_category) %>%
nrow(.)/10
setwd(figures_path)
ggsave("beh_antipredator_subcat_fig.pdf", plot = beh_antipredator_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "mating") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
mating_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "mating") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0, n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_mating_subcat_fig <- mating_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat,3)*100) %>%
ggplot(aes(x=sub_category, y=percent_sub_cat, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
coord_flip() +
labs(
x = "",
y = "Percentage of data"
) +
theme(
legend.position = "none"
)
beh_mating_subcat_fig
n_subcat <- mating_subcat %>%
dplyr::distinct(sub_category) %>%
nrow(.)/10
setwd(figures_path)
ggsave("beh_mating_subcat_fig.pdf", plot = beh_mating_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "post_mating") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
post_mating_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "post_mating") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0, n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_post_mating_subcat_fig <- post_mating_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat,3)*100) %>%
ggplot(aes(x=sub_category, y=percent_sub_cat, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
coord_flip() +
labs(
x = "",
y = "Percentage of data"
) +
theme(
legend.position = "none"
)
beh_post_mating_subcat_fig
n_subcat <- post_mating_subcat %>%
dplyr::distinct(sub_category) %>%
nrow(.)/10
setwd(figures_path)
ggsave("beh_post_mating_subcat_fig.pdf", plot = beh_post_mating_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "agression") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
agression_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "agression") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0, n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_agression_subcat_fig <- agression_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat,3)*100) %>%
ggplot(aes(x=sub_category, y=percent_sub_cat, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
coord_flip() +
labs(
x = "",
y = "Percentage of data"
) +
theme(
legend.position = "none"
)
beh_agression_subcat_fig
n_subcat <- agression_subcat %>%
dplyr::distinct(sub_category) %>%
nrow(.)/10
setwd(figures_path)
ggsave("beh_agression_subcat_fig.pdf", plot = beh_agression_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "cognition") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
cognition_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "cognition") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0, n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_cognition_subcat_fig <- cognition_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat,3)*100) %>%
ggplot(aes(x=sub_category, y=percent_sub_cat, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
coord_flip() +
labs(
x = "",
y = "Percentage of data"
) +
theme(
legend.position = "none"
)
beh_cognition_subcat_fig
n_subcat <- cognition_subcat %>%
dplyr::distinct(sub_category) %>%
nrow(.)/10
setwd(figures_path)
ggsave("beh_cognition_subcat_fig.pdf", plot = beh_cognition_subcat_fig, width = 10, height = 11.7*n_subcat)
sub_category_order <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "noncat") %>%
dplyr::group_by(sub_category) %>%
dplyr::reframe(n = sum(n_sub_cat)) %>%
dplyr::arrange(n) %>%
dplyr::pull(sub_category)
noncat_subcat <- behav_sub_cat_summary %>%
dplyr::filter(parent_category == "noncat") %>%
dplyr::select(study_motivation, sub_category, percent_sub_cat, n_sub_cat, percent_parent) %>%
tidyr::complete(sub_category, study_motivation, fill = list(percent_sub_cat = 0, n_sub_cat = 0)) %>%
dplyr::mutate(sub_category = factor(sub_category, levels = sub_category_order))
beh_noncat_subcat_fig <- noncat_subcat %>%
dplyr::mutate(percent_sub_cat = round(percent_sub_cat,3)*100) %>%
ggplot(aes(x=sub_category, y=percent_sub_cat, colour = study_motivation, fill = study_motivation, group = study_motivation)) +
geom_col(position = position_dodge(width = 0.8), width = 0.1) +
geom_point(position = position_dodge(width = 0.8), size = 3) +
geom_text(aes(label = percent_sub_cat), hjust=-0.6, size=3.5, color="black", position = position_dodge(width = 0.8)) +
scale_colour_manual(values = motivation_colour_theme, name = "Study motivation") +
scale_fill_manual(values = motivation_colour_theme, name = "Study motivation") +
theme_classic() +
coord_flip() +
labs(
x = "",
y = "Percentage of data"
) +
theme(
legend.position = "none"
)
beh_noncat_subcat_fig
n_subcat <- noncat_subcat %>%
dplyr::distinct(sub_category) %>%
nrow(.)/10
setwd(figures_path)
ggsave("beh_noncat_subcat_fig.pdf", plot = beh_noncat_subcat_fig, width = 10, height = 11.7*n_subcat)
Check where behaviour was meassured.
behav_location_summary <- EIPAAB_database %>%
dplyr::group_by(study_motivation, behav_test_location) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(behav_test_location, sep = ";") %>%
dplyr::group_by(study_motivation, behav_test_location) %>%
dplyr::reframe(n = sum(n)) %>%
tidyr::complete(behav_test_location, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
behav_location_overall <- behav_location_summary %>%
dplyr::group_by(behav_test_location) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(percent = round(n/sum(n)*100,1),
study_motivation = "Overall")
behav_location_summary <- rbind(behav_location_overall, behav_location_summary) %>%
dplyr::arrange(study_motivation)
behav_location_summary
## # A tibble: 12 × 4
## behav_test_location n percent study_motivation
## <chr> <int> <dbl> <chr>
## 1 indoor laboratory setting or assumed indoors 363 99.7 Basic research
## 2 outdoor natural setting 1 0.3 Basic research
## 3 outdoor restricted setting (cannot interact w… 0 0 Basic research
## 4 indoor laboratory setting or assumed indoors 852 98.7 Environmental
## 5 outdoor natural setting 8 0.9 Environmental
## 6 outdoor restricted setting (cannot interact w… 3 0.3 Environmental
## 7 indoor laboratory setting or assumed indoors 519 99.6 Medical
## 8 outdoor natural setting 1 0.2 Medical
## 9 outdoor restricted setting (cannot interact w… 1 0.2 Medical
## 10 indoor laboratory setting or assumed indoors 1734 99.2 Overall
## 11 outdoor natural setting 10 0.6 Overall
## 12 outdoor restricted setting (cannot interact w… 4 0.2 Overall
Check how often behaviour was meassured in a social context
behav_behav_scoring_summary <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(study_motivation, validity_behav_scoring_method) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(validity_behav_scoring_method, sep = ";") %>%
dplyr::group_by(study_motivation, validity_behav_scoring_method) %>%
dplyr::reframe(n = sum(n)) %>%
tidyr::complete(validity_behav_scoring_method, study_motivation, fill = list(n = 0)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
behav_behav_scoring_overall <- behav_behav_scoring_summary %>%
dplyr::group_by(validity_behav_scoring_method) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(percent = round(n/sum(n)*100,1),
study_motivation = "Overall")
behav_behav_scoring_summary <- rbind(behav_behav_scoring_overall, behav_behav_scoring_summary) %>%
dplyr::arrange(study_motivation)
behav_behav_scoring_summary
## # A tibble: 32 × 4
## validity_behav_scoring_method n percent study_motivation
## <chr> <int> <dbl> <chr>
## 1 acoustic analysis software 0 0 Basic research
## 2 live scoring in real time 17 10 Basic research
## 3 manual or human scoring from videos or image 43 25.3 Basic research
## 4 not specified 35 20.6 Basic research
## 5 other 0 0 Basic research
## 6 quantifying food consumption 2 1.2 Basic research
## 7 sensory for physical movement 2 1.2 Basic research
## 8 supervised automated tracking approaches 71 41.8 Basic research
## 9 acoustic analysis software 1 0.2 Environmental
## 10 live scoring in real time 52 9.6 Environmental
## # ℹ 22 more rows
behav_behav_scoring_summary %>%
dplyr::filter(study_motivation == "Overall")
## # A tibble: 8 × 4
## validity_behav_scoring_method n percent study_motivation
## <chr> <int> <dbl> <chr>
## 1 acoustic analysis software 1 0.1 Overall
## 2 live scoring in real time 84 8.6 Overall
## 3 manual or human scoring from videos or image 259 26.6 Overall
## 4 not specified 221 22.7 Overall
## 5 other 1 0.1 Overall
## 6 quantifying food consumption 21 2.2 Overall
## 7 sensory for physical movement 7 0.7 Overall
## 8 supervised automated tracking approaches 378 38.9 Overall
Making a dataframe for a flow diagram (sankey plot)
PICO_df <- EIPAAB_database %>%
dplyr::mutate(behav_cat = case_when(
behav_movement_boolean == 1 ~ "Movement",
behav_boldness_boolean == 1 ~ "Boldness",
behav_foraging_boolean == 1 ~ "Foraging",
behav_antipredator_boolean == 1 ~ "Antipredator",
behav_mating_boolean == 1 ~ "Mating",
behav_post_mating_boolean == 1 ~ "Post mating",
behav_agression_boolean == 1 ~ "Agression",
behav_sociality_boolean == 1 ~ "Sociality",
behav_cognition_boolean == 1 ~ "Cognition",
behav_noncat_boolean == 1 ~ "Not categorised",
)) %>%
dplyr::select(study_motivation, compound_name, compound_atc_level_3, species_name, species_class, behav_cat)
Let’s look at the 10 most common classes and ATCs
PICO_class_atc <- PICO_df %>%
dplyr::filter(!is.na(compound_atc_level_3), !is.na(species_class)) %>%
tidyr::separate_rows(compound_atc_level_3, sep = ";") %>%
dplyr::mutate(compound_atc_level_3 = str_trim(compound_atc_level_3))
PICO_atc_10 <- PICO_class_atc %>%
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::pull(compound_atc_level_3)
PICO_class_10 <- PICO_class_atc %>%
dplyr::group_by(species_class) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::pull(species_class)
PICO_class_10 <- PICO_class_atc %>%
dplyr::group_by(species_class) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:10) %>%
dplyr::pull(species_class)
behav_cat_order <- PICO_class_atc %>%
dplyr::filter(compound_atc_level_3 %in% PICO_atc_10 & species_class %in% PICO_class_10) %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::pull(behav_cat)
PICO_class_atc_10 <- PICO_class_atc %>%
dplyr::filter(compound_atc_level_3 %in% PICO_atc_10 & species_class %in% PICO_class_10) %>%
dplyr::mutate(compound_atc_level_3 = factor(compound_atc_level_3, levels = PICO_atc_10),
species_class = factor(species_class, levels = PICO_class_10),
behav_cat = factor(behav_cat, levels = behav_cat_order)) %>%
dplyr::select(compound_atc_level_3, behav_cat, species_class)
PICO_atc_class_sankey <- highcharter::hchart(data_to_sankey(PICO_class_atc_10), "sankey")
PICO_atc_class_sankey
setwd(figures_path)
htmlwidgets::saveWidget(widget = PICO_atc_class_sankey, file = "PICO_atc_class_sankey.html")
setwd(figures_path)
# Make a webshot in pdf : high quality but can not choose printed zone
webshot::webshot("PICO_atc_class_sankey.html" , "PICO_atc_class_sankey.pdf", delay = 10)
## NULL
Lets take a closer look at the 3 most common compounds
EIPAAB_database %>%
dplyr::group_by(compound_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:3)
## # A tibble: 3 × 2
## compound_name n
## <chr> <int>
## 1 Fluoxetine 201
## 2 Diazepam 67
## 3 17-alpha-ethinylestradiol 63
spp_order <- PICO_df %>%
dplyr::filter(compound_name == "Fluoxetine") %>%
dplyr::group_by(species_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:5) %>%
pull(species_name)
beh_order <- PICO_df %>%
dplyr::filter(compound_name == "Fluoxetine" & species_name %in% spp_order) %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
pull(behav_cat)
PICO_fluoxetine <- PICO_df %>%
dplyr::filter(compound_name == "Fluoxetine" & species_name %in% spp_order) %>%
dplyr::mutate(
species_name = factor(species_name, levels = spp_order),
behav_cat = factor(behav_cat, levels = beh_order),
) %>%
dplyr::select(species_name, behav_cat)
PICO_fluoxetine %>%
dplyr::filter(species_name == "Betta splendens") %>%
dplyr::distinct(behav_cat)
## behav_cat
## 1 Boldness
## 2 Agression
## 3 Mating
## 4 Movement
PICO_fluoxetine_sankey <- highcharter::hchart(data_to_sankey(PICO_fluoxetine), "sankey", name = "PICO")
PICO_fluoxetine_sankey
setwd(figures_path)
htmlwidgets::saveWidget(widget = PICO_fluoxetine_sankey, file = "PICO_fluoxetine_sankey.html")
setwd(figures_path)
# Make a webshot in pdf : high quality but can not choose printed zone
webshot::webshot("PICO_fluoxetine_sankey.html" , "PICO_fluoxetine_sankey.pdf", delay = 10)
## NULL
spp_order <- PICO_df %>%
dplyr::filter(compound_name == "Diazepam") %>%
dplyr::group_by(species_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:5) %>%
pull(species_name)
beh_order <- PICO_df %>%
dplyr::filter(compound_name == "Diazepam" & species_name %in% spp_order) %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
pull(behav_cat)
PICO_diazepam <- PICO_df %>%
dplyr::filter(compound_name == "Diazepam" & species_name %in% spp_order) %>%
dplyr::mutate(
species_name = factor(species_name, levels = spp_order),
behav_cat = factor(behav_cat, levels = beh_order),
) %>%
dplyr::select(species_name, behav_cat)
PICO_diazepam_sankey <- highcharter::hchart(data_to_sankey(PICO_diazepam), "sankey", name = "PICO")
PICO_diazepam_sankey
setwd(figures_path)
htmlwidgets::saveWidget(widget = PICO_diazepam_sankey, file = "PICO_diazepam_sankey.html")
setwd(figures_path)
# Make a webshot in pdf : high quality but can not choose printed zone
webshot::webshot("PICO_diazepam_sankey.html" , "PICO_diazepam_sankey.pdf", delay = 10)
## NULL
spp_order <- PICO_df %>%
dplyr::filter(compound_name == "17-alpha-ethinylestradiol") %>%
dplyr::group_by(species_name) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
dplyr::slice(1:5) %>%
pull(species_name)
beh_order <- PICO_df %>%
dplyr::filter(compound_name == "17-alpha-ethinylestradiol" & species_name %in% spp_order) %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = n()) %>%
dplyr::arrange(desc(n)) %>%
pull(behav_cat)
PICO_EE2 <- PICO_df %>%
dplyr::filter(compound_name == "17-alpha-ethinylestradiol" & species_name %in% spp_order) %>%
dplyr::mutate(
species_name = factor(species_name, levels = spp_order),
behav_cat = factor(behav_cat, levels = beh_order),
) %>%
dplyr::select(species_name, behav_cat)
PICO_EE2_sankey <- highcharter::hchart(data_to_sankey(PICO_EE2), "sankey", name = "PICO")
PICO_EE2_sankey
setwd(figures_path)
htmlwidgets::saveWidget(widget = PICO_EE2_sankey, file = "PICO_EE2_sankey.html")
setwd(figures_path)
# Make a webshot in pdf : high quality but can not choose printed zone
webshot::webshot("PICO_EE2_sankey.html" , "PICO_EE2_sankey.pdf", delay = 10)
## NULL
Identify knowledge clusters and gaps.
We will also do this by study motivation, because the knowleage gaps will be motivation spesfic.
First look by species class
Making a dataframe
behav_cat_class_long <- PICO_df %>%
dplyr::group_by(study_motivation, species_class, behav_cat) %>%
dplyr::reframe(count = n()) %>%
tidyr::complete(study_motivation, species_class, behav_cat, fill = list(count = 0)) %>%
dplyr::group_by(study_motivation, species_class) %>%
dplyr::mutate(total_class_motivation = sum(count)) %>%
dplyr::ungroup() %>%
dplyr::mutate(rel_percent = round(count/total_class_motivation*100,0),
rel_percent = if_else(is.finite(rel_percent), rel_percent, 0)
)
class_order <- behav_cat_class_long %>%
dplyr::group_by(species_class) %>%
dplyr::reframe(n = sum(count)) %>%
dplyr::arrange(n) %>%
dplyr::pull(species_class)
behav_cat_order <- behav_cat_class_long %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = sum(count)) %>%
dplyr::arrange(desc(n)) %>%
dplyr::pull(behav_cat)
behav_cat_class_long <- behav_cat_class_long %>%
dplyr::mutate(species_class = factor(species_class, levels = class_order),
behav_cat = factor(behav_cat, levels = behav_cat_order)
)
cust_col <- colorRampPalette(c("#FDEDF4", "#F068A7"))(30)
behav_class_hm <- behav_cat_class_long %>%
ggplot(aes(x = behav_cat, y = species_class, fill = count)) +
geom_tile() +
#geom_text(aes(label = ifelse(count == 0, NA, count)), color = "black", size = 3) +
scale_fill_gradientn(colors = cust_col, na.value = "white", limits = c(1, max(behav_cat_class_long$count, na.rm = TRUE)), guide = "none") +
theme_bw() +
facet_wrap(~study_motivation) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
x = "Behaviour",
y = "Species Class",
fill = "Count"
)
behav_class_hm
setwd(figures_path)
ggsave("behav_class_hm.pdf", plot = behav_class_hm, width = 8.3, height = 11.7/2)
This one uses relative vaules for each class (e.g. row in the heat map)
cust_col <- colorRampPalette(brewer.pal(4, "Oranges"))(30)
behav_class_rel_hm <- behav_cat_class_long %>%
ggplot(aes(x = behav_cat, y = species_class, fill = rel_percent)) +
geom_tile() +
#geom_text(aes(label = ifelse(rel_percent == 0, NA, rel_percent)), color = "black", size = 3) +
scale_fill_gradientn(colors = cust_col, na.value = "white", limits = c(1, max(behav_cat_class_long$count, na.rm = TRUE)), guide = "none") +
theme_bw() +
facet_wrap(~study_motivation) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
x = "Behaviour",
y = "Species Class",
fill = "Count"
)
behav_class_rel_hm
setwd(figures_path)
ggsave("behav_class_rel_hm.pdf", plot = behav_class_rel_hm, width = 8.3, height = 11.7/2)
Now looking by compound
Making a dataframe
behav_atc_long <- PICO_df %>%
separate_rows(compound_atc_level_3, sep = ";") %>%
dplyr::group_by(study_motivation, compound_atc_level_3, behav_cat) %>%
dplyr::reframe(count = n()) %>%
tidyr::complete(study_motivation, compound_atc_level_3, behav_cat, fill = list(count = 0)) %>%
dplyr::group_by(study_motivation, compound_atc_level_3) %>%
dplyr::mutate(total_atc_motivation = sum(count)) %>%
dplyr::ungroup() %>%
dplyr::mutate(rel_percent = round(count/total_atc_motivation*100,0),
rel_percent = if_else(is.finite(rel_percent), rel_percent, 0)
)
atc_order <- behav_atc_long %>%
dplyr::group_by(compound_atc_level_3) %>%
dplyr::reframe(n = sum(count)) %>%
dplyr::arrange(n) %>%
dplyr::pull(compound_atc_level_3)
behav_cat_order <- behav_atc_long %>%
dplyr::group_by(behav_cat) %>%
dplyr::reframe(n = sum(count)) %>%
dplyr::arrange(desc(n)) %>%
dplyr::pull(behav_cat)
behav_atc_long <- behav_atc_long %>%
dplyr::mutate(compound_atc_level_3 = factor(compound_atc_level_3, levels = atc_order),
behav_cat = factor(behav_cat, levels = behav_cat_order)
)
behav_atc_long %>%
dplyr::distinct(compound_atc_level_3)
## # A tibble: 132 × 1
## compound_atc_level_3
## <fct>
## 1 a01a stomatological preparations
## 2 a02b drugs for peptic ulcer and gastro-oesophageal reflux disease (gord)
## 3 a03b belladonna and derivatives, plain
## 4 a03f propulsives
## 5 a04a antiemetics and antinauseants
## 6 a06a drugs for constipation
## 7 a07a intestinal antiinfectives
## 8 a07e intestinal antiinflammatory agents
## 9 a08a antiobesity preparations, excl. diet products
## 10 a10b blood glucose lowering drugs, excl. insulins
## # ℹ 122 more rows
cust_col <- colorRampPalette(c("#FDEDF4", "#F068A7"))(30)
behav_atc_hm <- behav_atc_long %>%
ggplot(aes(x = behav_cat, y = compound_atc_level_3, fill = count)) +
geom_tile() +
#geom_text(aes(label = ifelse(count == 0, NA, count)), color = "black", size = 3) +
scale_fill_gradientn(colors = cust_col, na.value = "white", limits = c(1, max(behav_atc_long$count, na.rm = TRUE)), guide = "none") +
theme_void() +
facet_wrap(~study_motivation) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
x = "Behaviour",
y = "Species Class",
fill = "Count"
)
behav_atc_hm
setwd(figures_path)
ggsave("behav_class_hm.pdf", plot = behav_class_hm, width = 8.3, height = 11.7/2)
This one uses relative vaules for each class (e.g. row in the heat map)
cust_col <- colorRampPalette(brewer.pal(4, "Oranges"))(30)
behav_class__rel_hm <- behav_cat_class_long %>%
ggplot(aes(x = behav_cat, y = species_class, fill = rel_percent)) +
geom_tile() +
#geom_text(aes(label = ifelse(rel_percent == 0, NA, rel_percent)), color = "black", size = 3) +
scale_fill_gradientn(colors = cust_col, na.value = "white", limits = c(1, max(behav_cat_class_long$count, na.rm = TRUE)), guide = "none") +
theme_bw() +
facet_wrap(~study_motivation) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(
x = "Behaviour",
y = "Species Class",
fill = "Count"
)
behav_class__rel_hm
setwd(figures_path)
ggsave("behav_class__rel_hm.pdf", plot = behav_class__rel_hm, width = 8.3, height = 11.7/2)
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::group_by(additional_biomarkers) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(total = sum(n),
percent = n/total)
## # A tibble: 2 × 4
## additional_biomarkers n total percent
## <chr> <int> <int> <dbl>
## 1 No 435 901 0.483
## 2 Yes 466 901 0.517
EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::group_by(validity_survival_growth_reproduction) %>%
dplyr::reframe(n = n()) %>%
dplyr::mutate(total = sum(n),
percent = n/total)
## # A tibble: 2 × 4
## validity_survival_growth_reproduction n total percent
## <chr> <int> <int> <dbl>
## 1 No 543 901 0.603
## 2 Yes 358 901 0.397
This a list of all 19 metadata that relate to our validity information.
c(“validity_guideline”, “validity_good_laboratory_practice”, “validity_survival_growth_reproduction”, “validity_animal_feeding”, “validity_water_quality”, “validity_light_cycle”, “validity_randomization”, “validity_behav_scoring_method”, “validity_behav_blinding”, “validity_conflict_statement”, “species_source”, “species_stage”, “species_sex”, “compound_min_duration_exposure”, “compound_max_duration_exposure”, “validity_compound_cas_reported”, “validity_compound_purity_reported”, “validity_compound_water_verification”, “validity_compound_animal_verification”)
guideline <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_guideline, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
guideline_all <- guideline %>%
dplyr::group_by(validity_guideline) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
guideline_all <- rbind(guideline_all, guideline) %>%
dplyr::filter(validity_guideline == "Yes")
guideline_all
## # A tibble: 4 × 4
## validity_guideline n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 135 Overall 15
## 2 Yes 111 Environmental 21.8
## 3 Yes 14 Medical 6
## 4 Yes 10 Basic research 6.4
GLP <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_good_laboratory_practice, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
GLP_all <- GLP %>%
dplyr::group_by(validity_good_laboratory_practice) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
GLP_all <- rbind(GLP_all, GLP) %>%
dplyr::filter(validity_good_laboratory_practice == "Yes")
GLP_all
## # A tibble: 3 × 4
## validity_good_laboratory_practice n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 6 Overall 0.7
## 2 Yes 5 Environmental 1
## 3 Yes 1 Medical 0.4
survival_growth_reproduction <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_survival_growth_reproduction, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
survival_growth_reproduction_all <- survival_growth_reproduction %>%
dplyr::group_by(validity_survival_growth_reproduction) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
survival_growth_reproduction_all <- rbind(survival_growth_reproduction_all, survival_growth_reproduction) %>%
dplyr::filter(validity_survival_growth_reproduction == "Yes")
survival_growth_reproduction_all
## # A tibble: 4 × 4
## validity_survival_growth_reproduction n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 358 Overall 39.7
## 2 Yes 273 Environmental 53.5
## 3 Yes 61 Medical 26.2
## 4 Yes 24 Basic research 15.2
CAS <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_compound_cas_reported, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
CAS_all <- CAS %>%
dplyr::group_by(validity_compound_cas_reported) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
CAS_all <- rbind(CAS_all, CAS) %>%
dplyr::filter(validity_compound_cas_reported == "Yes")
CAS_all
## # A tibble: 4 × 4
## validity_compound_cas_reported n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 222 Overall 24.6
## 2 Yes 186 Environmental 36.5
## 3 Yes 25 Medical 10.7
## 4 Yes 11 Basic research 7
purity <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_compound_purity_reported, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
purity_all <- purity %>%
dplyr::group_by(validity_compound_purity_reported) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
purity_all <- rbind(purity_all, purity) %>%
dplyr::filter(validity_compound_purity_reported == "Yes")
purity_all
## # A tibble: 4 × 4
## validity_compound_purity_reported n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 230 Overall 25.5
## 2 Yes 200 Environmental 39.2
## 3 Yes 21 Medical 9
## 4 Yes 9 Basic research 5.7
stage <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(species_stage, study_motivation) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(species_stage, sep = ";") %>%
dplyr::group_by(species_stage, study_motivation) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
stage_all <- stage %>%
dplyr::group_by(species_stage) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
stage_all <- rbind(stage_all, stage) %>%
dplyr::filter(species_stage == "Unknown or not specified") %>%
dplyr::mutate(percent_reported = 100-percent)
stage_all
## # A tibble: 4 × 5
## species_stage n study_motivation percent percent_reported
## <chr> <int> <chr> <dbl> <dbl>
## 1 Unknown or not specified 166 Overall 16.6 83.4
## 2 Unknown or not specified 100 Environmental 17.1 82.9
## 3 Unknown or not specified 22 Medical 8.9 91.1
## 4 Unknown or not specified 44 Basic research 26.7 73.3
sex <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(species_sex, study_motivation) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(species_sex, sep = ";") %>%
dplyr::group_by(species_sex, study_motivation) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
sex_all <- sex %>%
dplyr::group_by(species_sex) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
sex_all <- rbind(sex_all, sex) %>%
dplyr::filter(species_sex == "Unknown or not specified") %>%
dplyr::mutate(percent_reported = 100-percent)
sex_all
## # A tibble: 4 × 5
## species_sex n study_motivation percent percent_reported
## <chr> <int> <chr> <dbl> <dbl>
## 1 Unknown or not specified 546 Overall 46.5 53.5
## 2 Unknown or not specified 325 Environmental 50.4 49.6
## 3 Unknown or not specified 132 Medical 40.9 59.1
## 4 Unknown or not specified 89 Basic research 43.4 56.6
source <- EIPAAB_database %>%
dplyr::group_by(unique_population_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(species_source, study_motivation) %>%
dplyr::reframe(n = n()) %>%
tidyr::separate_rows(species_source, sep = ";") %>%
dplyr::group_by(species_source, study_motivation) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
source_all <- source %>%
dplyr::group_by(species_source) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
source_all <- rbind(source_all, source) %>%
dplyr::filter(species_source == "Not reported") %>%
dplyr::mutate(percent_reported = 100-percent)
source_all
## # A tibble: 4 × 5
## species_source n study_motivation percent percent_reported
## <chr> <int> <chr> <dbl> <dbl>
## 1 Not reported 148 Overall 15.6 84.4
## 2 Not reported 72 Environmental 13.1 86.9
## 3 Not reported 51 Medical 21.2 78.8
## 4 Not reported 25 Basic research 15.4 84.6
feeding <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_animal_feeding, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
feeding_all <- feeding %>%
dplyr::group_by(validity_animal_feeding) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
feeding_all <- rbind(feeding_all, feeding) %>%
dplyr::filter(validity_animal_feeding == "Yes")
feeding_all
## # A tibble: 4 × 4
## validity_animal_feeding n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 716 Overall 79.5
## 2 Yes 430 Environmental 84.3
## 3 Yes 160 Medical 68.4
## 4 Yes 126 Basic research 80.3
water <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_water_quality, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
water_all <- water %>%
dplyr::group_by(validity_water_quality) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
water_all <- rbind(water_all, water) %>%
dplyr::filter(validity_water_quality == "Yes")
water_all
## # A tibble: 4 × 4
## validity_water_quality n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 806 Overall 89.5
## 2 Yes 473 Environmental 92.7
## 3 Yes 207 Medical 88.5
## 4 Yes 126 Basic research 80.3
light <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_light_cycle, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
light_all <- light %>%
dplyr::group_by(validity_light_cycle) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
light_all <- rbind(light_all, light) %>%
dplyr::filter(validity_light_cycle == "Yes")
light_all
## # A tibble: 4 × 4
## validity_light_cycle n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 756 Overall 83.9
## 2 Yes 429 Environmental 84.1
## 3 Yes 200 Medical 85.8
## 4 Yes 127 Basic research 80.4
min_duration <- EIPAAB_database %>%
dplyr::group_by(compound_min_duration_exposure, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
min_duration_all <- min_duration %>%
dplyr::group_by(compound_min_duration_exposure) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
min_duration_all <- rbind(min_duration_all, min_duration) %>%
dplyr::filter(compound_min_duration_exposure == "Not stated") %>%
dplyr::mutate(percent_reported = 100-percent)
min_duration_all
## # A tibble: 4 × 5
## compound_min_duration_exposure n study_motivation percent percent_reported
## <chr> <int> <chr> <dbl> <dbl>
## 1 Not stated 102 Overall 5.9 94.1
## 2 Not stated 38 Environmental 4.4 95.6
## 3 Not stated 52 Medical 10 90
## 4 Not stated 12 Basic research 3.3 96.7
max_duration <- EIPAAB_database %>%
dplyr::group_by(compound_max_duration_exposure, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
max_duration_all <- max_duration %>%
dplyr::group_by(compound_max_duration_exposure) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
max_duration_all <- rbind(max_duration_all, max_duration) %>%
dplyr::filter(compound_max_duration_exposure == "Not stated") %>%
dplyr::mutate(percent_reported = 100-percent)
max_duration_all
## # A tibble: 4 × 5
## compound_max_duration_exposure n study_motivation percent percent_reported
## <chr> <int> <chr> <dbl> <dbl>
## 1 Not stated 96 Overall 5.5 94.5
## 2 Not stated 32 Environmental 3.7 96.3
## 3 Not stated 53 Medical 10.2 89.8
## 4 Not stated 11 Basic research 3 97
water_verification <- EIPAAB_database %>%
dplyr::filter(!is.na(validity_compound_water_verification)) %>%
dplyr::group_by(validity_compound_water_verification, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
water_verification_all <- water_verification %>%
dplyr::group_by(validity_compound_water_verification) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
water_verification_all <- rbind(water_verification_all, water_verification) %>%
dplyr::filter(validity_compound_water_verification == "Measured")
water_verification_all
## # A tibble: 4 × 4
## validity_compound_water_verification n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Measured 313 Overall 20.6
## 2 Measured 295 Environmental 35.8
## 3 Measured 10 Medical 2.5
## 4 Measured 8 Basic research 2.7
tissue_verification <- EIPAAB_database %>%
#dplyr::filter(!is.na(validity_compound_animal_verification)) %>%
dplyr::group_by(validity_compound_animal_verification, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
tissue_verification_all <- tissue_verification %>%
dplyr::group_by(validity_compound_animal_verification) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
tissue_verification_all <- rbind(tissue_verification_all, tissue_verification) %>%
dplyr::filter(validity_compound_animal_verification == "Yes")
tissue_verification_all
## # A tibble: 4 × 4
## validity_compound_animal_verification n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 154 Overall 8.9
## 2 Yes 115 Environmental 13.4
## 3 Yes 22 Medical 4.2
## 4 Yes 17 Basic research 4.7
randomization <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_randomization, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
randomization_all <- randomization %>%
dplyr::group_by(validity_randomization) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
randomization_all <- rbind(randomization_all, randomization) %>%
dplyr::filter(validity_randomization == "Yes") %>%
dplyr::mutate(percent_disclosed = 100-percent)
randomization_all
## # A tibble: 4 × 5
## validity_randomization n study_motivation percent percent_disclosed
## <chr> <int> <chr> <dbl> <dbl>
## 1 Yes 362 Overall 40.2 59.8
## 2 Yes 229 Environmental 44.9 55.1
## 3 Yes 75 Medical 32.2 67.8
## 4 Yes 58 Basic research 36.7 63.3
blinding <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_behav_blinding, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
blinding_all <- blinding %>%
dplyr::group_by(validity_behav_blinding) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
blinding_all <- rbind(blinding_all, blinding) %>%
dplyr::filter(validity_behav_blinding == "Yes")
blinding_all
## # A tibble: 4 × 4
## validity_behav_blinding n study_motivation percent
## <chr> <int> <chr> <dbl>
## 1 Yes 153 Overall 17
## 2 Yes 75 Environmental 14.7
## 3 Yes 44 Medical 18.9
## 4 Yes 34 Basic research 21.5
behav_scoring <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
tidyr::separate_rows(validity_behav_scoring_method, sep = ";") %>%
dplyr::group_by(validity_behav_scoring_method, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
behav_scoring_all <- behav_scoring %>%
dplyr::group_by(validity_behav_scoring_method) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
behav_scoring_all <- rbind(behav_scoring_all, behav_scoring) %>%
dplyr::filter(validity_behav_scoring_method == "not specified") %>%
dplyr::mutate(percent_specified = 100-percent)
behav_scoring_all
## # A tibble: 4 × 5
## validity_behav_scoring_method n study_motivation percent percent_specified
## <chr> <int> <chr> <dbl> <dbl>
## 1 not specified 221 Overall 22.7 77.3
## 2 not specified 130 Environmental 24 76
## 3 not specified 56 Medical 21.5 78.5
## 4 not specified 35 Basic research 20.7 79.3
conflict <- EIPAAB_database %>%
dplyr::group_by(article_id) %>%
dplyr::sample_n(1) %>%
dplyr::ungroup() %>%
dplyr::group_by(validity_conflict_statement, study_motivation) %>%
dplyr::reframe(n = n()) %>%
dplyr::group_by(study_motivation) %>%
dplyr::mutate(total_motivation = sum(n)) %>%
dplyr::ungroup() %>%
dplyr::mutate(percent = round(n/total_motivation*100,1)) %>%
dplyr::select(-total_motivation)
conflict_all <- conflict %>%
dplyr::group_by(validity_conflict_statement) %>%
dplyr::reframe(n = sum(n)) %>%
dplyr::mutate(study_motivation = "Overall",
percent = round(n/sum(n)*100,1))
conflict_all <- rbind(conflict_all, conflict) %>%
dplyr::filter(validity_conflict_statement == "No statement is made in the paper") %>%
dplyr::mutate(percent_specified = 100-percent)
conflict_all
## # A tibble: 4 × 5
## validity_conflict_statement n study_motivation percent percent_specified
## <chr> <int> <chr> <dbl> <dbl>
## 1 No statement is made in the … 407 Overall 45.2 54.8
## 2 No statement is made in the … 254 Environmental 49.8 50.2
## 3 No statement is made in the … 65 Medical 27.8 72.2
## 4 No statement is made in the … 88 Basic research 56.1 43.9
# pander for making it look nicer
sessionInfo() %>% pander()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
locale: en_US.UTF-8||en_US.UTF-8||en_US.UTF-8||C||en_US.UTF-8||en_US.UTF-8
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: pander(v.0.6.5), highcharter(v.0.9.4), ggdist(v.3.3.2), gridExtra(v.2.3), ape(v.5.8), treeio(v.1.28.0), ggtree(v.3.12.0), RColorBrewer(v.1.1-3), ggrepel(v.0.9.5), igraph(v.2.0.3), ggraph(v.2.2.1), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), dplyr(v.1.1.4), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), ggplot2(v.3.5.1) and tidyverse(v.2.0.0)
loaded via a namespace (and not attached): tidyselect(v.1.2.1), viridisLite(v.0.4.2), farver(v.2.1.2), viridis(v.0.6.5), fastmap(v.1.2.0), lazyeval(v.0.2.2), tweenr(v.2.0.3), pacman(v.0.5.1), digest(v.0.6.36), timechange(v.0.3.0), lifecycle(v.1.0.4), tidytree(v.0.4.6), magrittr(v.2.0.3), compiler(v.4.4.1), rlang(v.1.1.4), sass(v.0.4.9), tools(v.4.4.1), utf8(v.1.2.4), yaml(v.2.3.10), data.table(v.1.15.4), knitr(v.1.48), labeling(v.0.4.3), htmlwidgets(v.1.6.4), graphlayouts(v.1.1.1), curl(v.5.2.1), TTR(v.0.24.4), aplot(v.0.2.3), withr(v.3.0.1), grid(v.4.4.1), polyclip(v.1.10-7), fansi(v.1.0.6), xts(v.0.14.0), colorspace(v.2.1-1), scales(v.1.3.0), MASS(v.7.3-60.2), cli(v.3.6.3), rmarkdown(v.2.27), ragg(v.1.3.2), generics(v.0.1.3), rlist(v.0.4.6.2), rstudioapi(v.0.16.0), tzdb(v.0.4.0), cachem(v.1.1.0), ggforce(v.0.4.2), assertthat(v.0.2.1), parallel(v.4.4.1), ggplotify(v.0.1.2), vctrs(v.0.6.5), yulab.utils(v.0.1.5), webshot(v.0.5.5), jsonlite(v.1.8.8), gridGraphics(v.0.5-1), hms(v.1.1.3), patchwork(v.1.2.0), systemfonts(v.1.1.0), jquerylib(v.0.1.4), quantmod(v.0.4.26), glue(v.1.7.0), codetools(v.0.2-20), distributional(v.0.4.0), stringi(v.1.8.4), gtable(v.0.3.5), munsell(v.0.5.1), pillar(v.1.9.0), htmltools(v.0.5.8.1), R6(v.2.5.1), textshaping(v.0.4.0), tidygraph(v.1.3.1), evaluate(v.0.24.0), lattice(v.0.22-6), highr(v.0.11), backports(v.1.5.0), memoise(v.2.0.1), broom(v.1.0.6), ggfun(v.0.1.5), bslib(v.0.8.0), Rcpp(v.1.0.13), nlme(v.3.1-164), xfun(v.0.46), zoo(v.1.8-12), fs(v.1.6.4) and pkgconfig(v.2.0.3)
14.2.8 Sociality plots